Sunday 8 January 2017

Much ado about NOAAthing

I know NOAAthing.

This post is about nothing. Nearly nothing. But when I found this title I had to write it.

Once upon a time in America there were some political activists who claimed that global warming had stopped. These were the moderate voices, with many people in this movement saying that an ice age is just around the corner. Others said global warming paused, hiatused or slowed down. I feel that good statistics has always shown this idea to be complete rubbish (Foster and Abraham, 2015; Lewandowsky et al., 2016), but at least in 2017 it should be clear that it is nothing, nothing what so ever. It is interpreting noise. More kindly: interpreting variability, mostly El Nino variability.

Even if you disingenuously cherry-pick 1998 the hot El Nino year as the first year of your trend to get a smaller trend, the short-term trend is about the same size as the long-term trend now that 2016 is another hot El Nino year to balance out the first crime. Zeke Hausfather tweeted to the graph below: "You keep using that word, "pause". I do not think it means what you think it means." #CulturalReference

In 2013 Boyin Huang of NOAA and his colleagues created an improved sea surface dataset called ERSST.v4. No one cared about this new analysis. Normal good science.

Thomas Karl of NOAA and his colleagues showed what the update means for the global temperature (ocean and land). The interesting part is the lower panel. It shows that the adjustments make global warming smaller by about 0.2°C. Climate data scientists naturally knew this and I blogged about his before, but I think the Karl paper was the first time this was shown in the scientific literature. (The adjustments are normally shown for the individual land or ocean datasets.)

But this post is unfortunately about nearly nothing, about the minimal changes in the top panel of the graph below. I made the graph extra large, so that you can see the differences. The thick black line shows the new assessment (ERSST.v4) and the thin red line the previous estimated global temperature signal (ERSST.v3). Differences are mostly less than 0.05°C, both warmer and cooler. The "problem" is the minute change at the right end of the curves.

The new paper by Zeke Hausfather and colleagues now shows evidence that the updated dataset (ERSSTv4) is indeed better than the previous version (ERSSTv3b). It is a beautifully done study of high technical quality. They do so by comparing the ERSST dataset, which comes from a large number of data sources, with  data that comes only from only one source (buoys, satellites (CCl) or ARGO). These single-source datasets are shorter, but without trend uncertainties due to the combination of sources.

The recent trend of HadSST also seems to be too small and to a lesser amount also COBE-SST. This problem with HadSST was known, but not published yet. The warm bias of ships that measure SST at their engine room intake is getting smaller over the last decade. The reason for this is not yet clear. The main contender seems to be that the fleet has become more actively managed and (typically warm) bad measurements have been discontinued.

Also ERSST uses ship data, but it gives them a much smaller weight compared to the buoy data. That makes this problem less visible in ERSST. Prepare for a small warming update for recent temperatures once this problem is better understood and corrected for. And prepare for the predictable cries of the mitigation skeptical movement and their political puppets.

Karl and colleagues showed that as a consequence of the minimal changes in ERSST and if you start a trend in 1998 and compute a trend, this trend is statistically significant. In the graph below you can see in the left global panel that the old version of ERSST (circles) had a 90% confidence interval (vertical line) that includes zero (not statistically significantly different from zero), while the confidence interval of updated dataset did not (statistically significant).

Did I mention that such a cherry-picked begin year is a very bad idea? The right statistical test is one for a trend change at an unknown year. This test provides no evidence whatsoever for a recent trend change.

That the trend in Karl and colleagues was statistically significant should thus not have mattered: Nothing could be worse than define a "hiatus" period as one were the confidence interval of a trend includes zero. However, this is the definition public speaker Christopher Monckton uses for his blog posts at Watts Up With That, a large blog of the mitigation skeptical movement. Short-term trends are very uncertain, their uncertainty increases very fast the shorter the period is. Thus if your period is short enough, you will find a trend whose confidence interval includes zero.

You should not do this kind of statistical test in the first place because of the inevitable cherry picking of the period, but if you want to statistically test whether the long-term trend suddenly dropped, the test should have the long-term trend as null-hypothesis. This is the 21st century, we understand the physics of man-made global warming, we know it should be warming, it would be enormously surprising and without any explanation if "global warming had stopped". Thus continued warming is the thing that should be disproven, not a flat trend line. Good luck doing so for such short periods given how enormously uncertain short-term trends are.

The large uncertainty also means that cherry picking a specific period to get a low trend has a large impact. I will show this numerically in an upcoming post. The methods to compute a confidence interval are for a randomly selected period, not for a period that was selected to have a low trend.

Concluding, we have something that does not exist, but which was made into an major talking point of the mitigation skeptical movement. This movement put their credibility on fluctuations that produced a minor short-term trend change that was not statistically significant. The deviation was also so small that it put an unfounded confidence in the perfection of the data.

The inevitable happened and small corrections needed to be made to the data. After this even disingenuous cherry-picking and bad statistics were no longer enough to support the talking point. As a consequence Lamar Smith of TX21 abused his Washington power to punish politically inconvenient science. Science that was confirmed this week. This should all have been politically irrelevant because the statistics were wrong all along. This was politically irrelevant by now because the new El Nino produced record temperatures in 2016 and even cherry picking 1998 as begin year is no longer enough.

"Much Ado About Nothing is generally considered one of Shakespeare's best comedies because it combines elements of mistaken identities, love, robust hilarity with more serious meditations on honour, shame, and court politics."
Yes, I get my culture from Wikipedia)

To end on a positive note, if your are interested in sea surface temperature and its uncertainties, we just published a review paper in the Bulletin of the American Meteorological Society: "A call for new approaches to quantifying biases in observations of sea-surface temperature." This focuses on ideas for future research and how the SST community can make it easier for others to join the field and work on improving the data.

Another good review paper on the quality of SST observations is: "Effects of instrumentation changes on sea surface temperature measured in situ" and also the homepage of HadSST is quite informative. For more information on the three main sea surface temperature datasets follow these links: ERSSTv4, HadSST3 and COBE-SST. Thanks to John Kennedy for suggesting the links in this paragraph.

Do watch the clear video below where Zeke Hausfather explains the study and why he thinks recent ocean warming used to be underestimated.

Related reading

The op-ed by the authors Kevin Cowtan and Zeke Hausfather is probably the best article on the study: Political Investigation Is Not the Way to Scientific Truth. Independent replication is the key to verification; trolling through scientists' emails looking for out-of-context "gotcha" statements isn't.

Scott K. Johnson in Ars Technica (a reading recommendation for science geeks by itself): New analysis shows Lamar Smith’s accusations on climate data are wrong. It wasn't a political plot—temperatures really did get warmer.

Phil Plait (Bad Astronomy) naturally has a clear explanation of the study and the ensuing political harassment: New Study Confirms Sea Surface Temperatures Are Warming Faster Than Previously Thought

The take of the UK MetOffice, producers of HadSST, on the new study and the differences found for HadSST: The challenge of taking the temperature of the world’s oceans

Hotwhopper is your explainer if you like your stories with a little snark: The winner is NOAA - for global sea surface temperature

Hotwhopper follow-up: Dumb as: Anthony Watts complains Hausfather17 authors didn't use FUTURE data. With such a response to the study it is unreasonable to complain about snark in the response.

The Christian Science Monitor gives a good non-technical summary: Debunking the myth of climate change 'hiatus': Where did it come from?

I guess it is hard for a journalist to not write that the topic is not important. Chris Mooney at the Washington Post claims Karl and colleagues is important: NOAA challenged the global warming ‘pause.’ Now new research says the agency was right.

Climate Denial Crock of the Week with Peter Sinclair: New Study Shows (Again): Deniers Wrong, NOAA Scientists Right. Quotes from several articles and has good explainer videos.

Global Warming ‘Hiatus’ Wasn’t, Second Study Confirms

The guardian blog by John Abraham: New study confirms NOAA finding of faster global warming

Atmospheric warming hiatus: The peculiar debate about the 2% of the 2%

No! Ah! Part II. The return of the uncertainty monster

How can the pause be both ‘false’ and caused by something?


Grant Foster and John Abraham, 2015: Lack of evidence for a slowdown in global temperature. US CLIVAR Variations, Summer 2015, 13, No. 3.

Zeke Hausfather, Kevin Cowtan, David C. Clarke, Peter Jacobs, Mark Richardson, Robert Rohde, 2017: Assessing recent warming using instrumentally homogeneous sea surface temperature records. Science Advances, 04 Jan 2017.

Boyin Huang, Viva F. Banzon, Eric Freeman, Jay Lawrimore, Wei Liu, Thomas C. Peterson, Thomas M. Smith, Peter W. Thorne, Scott D. Woodruff, and Huai-Min Zhang, 2015: Extended Reconstructed Sea Surface Temperature Version 4 (ERSST.v4). Part I: Upgrades and Intercomparisons. Journal Climate, 28, pp. 911–930, doi: 10.1175/JCLI-D-14-00006.1.

Thomas R. Karl, Anthony Arguez, Boyin Huang, Jay H. Lawrimore, James R. McMahon, Matthew J. Menne, Thomas C. Peterson, Russell S. Vose, Huai-Min Zhang, 2015: Possible artifacts of data biases in the recent global surface warming hiatus. Science. doi: 10.1126/science.aaa5632.

Lewandowsky, S., J. Risbey, and N. Oreskes, 2016: The “Pause” in Global Warming: Turning a Routine Fluctuation into a Problem for Science. Bull. Amer. Meteor. Soc., 97, 723–733, doi: 10.1175/BAMS-D-14-00106.1.


  1. Hi Victor, I used to work as a Marine Tech onboard the US research fleet. I can confirm that the community is actively working on solutions to adjust for engine room warm bias along with the temporal lag from intake to sensor. As you suggested, I suspect these efforts are responsible for the reduction in warm bias. Unfortunately I don't know of any formal effort by UNOLS to categorize the work that has been done, but I can confirm it is happening at the ship level.

  2. Hi Victor,

    trend change is not the only break point analysis possible. The Cahill et al. (2015) analysis contained in the realclimate post you linked to is justified on the basis that there cannot be a discontinuity in the warming record. This assumption is incorrect - ocean-atmosphere interactions are quite capable of causing step-like changes, and other breakpoint methods do show a discontinuity in 1997/98.

    My Ph D scholar colleague Jim Ricketts has also produced an analysis that probes the weaknesses of the Cahill approach. The error uncertainties around the break points selected are large and the lack of a physical explanation such as timing of regime changes mean that it is a fitting exercise at best. His Ph D is taking preference, so there will be delay until this analysis is submitted.

    I agree with you that the pause/hiatus argument you are describing is much ado about NOAAthing but for different reasons.

  3. Lucie Vincent worked with a two-phase regression method for relative homogenization. This included both trend changes and jumps. You could also use that for this problem. In practise it is really hard to distinguish between the two in a noisy time series. When gradual changes are not linear, often setting multiple breaks is the more parsimonious solution.

    Such a more complicated test would make the number of degrees of freedom larger. Thus I doubt that you would see better results. That would only happen in case of a clear step change, which I do not see in the noisy global mean temperature. Maybe for Australia, where El Nino is more important, but I do not see a clear step in the global mean temperature. Would be very surprised if a break model would suddenly see a "hiatus", but feel free to try.

    The large error in the uncertainties are indicted in the plot of Cahill. You may be interested in an article by my colleague Ralf and me on the uncertainty in break positions, in this case only for single breaks. Above a signal to noise ratio of 1, the breaks quickly become quite certain. Below it, it is nearly impossible to tell where they are, even if you know that they are somewhere.

    It might be nice to use a single breakpoint test on the data since the 1960s. That would give a p-value for the "break" in 1998. Cahill could only tell us something about the fit over entire series and part of the breaks they found are also due to the likely too high temperature during WWII.

    I do not see why this uncertainty in the break position would be an argument against using this statistical test over single change point tests at a cherry picked time or "methods" that compute trend uncertainties over cherry-picked periods. The latter two methods are fundamentally wrong.


Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.