Monday 30 January 2017

With some programing skills you can compute global mean temperatures yourself

This is a guest post by citizen scientist Ron Roeland (not his real name, but I like alliteration for some reason). Being an actually sceptical person, he decided to compute the global mean land temperature from station observations himself. He could reproduce the results of the main scientific groups that compute this signal and, new for me, while studying the data noticed how important the relocation of temperature stations to airports is for the NOAA GHCNv3 dataset. (The headers in the post are mine.)

This post does not pretend to present a rigorous analysis of the global temperature record; instead, it intends to show how easy it is for someone with basic programming/math skills to debunk claims that NASA and NOAA have manipulated temperature data to produce their global-average temperature results, i.e. claims like these:

From C3 Headlines: By utilizing questionable adjustments based on even more questionable assumptions, NOAA managed to produce an entirely fabricated increase in the global warming trend from 1998 to 2012.

From a blogger on the Hill: There’s going to have to be a massive effort to pick apart failing climate models and questionably-adjusted data.

From Climate Depot: Over the past decade, NASA and NOAA have continuously altered the temperature record to cool the past and warm the present. Their claims are straight out Orwell's 1984, and have nothing to do with science'

The routine

Some time ago, after reading all kinds of claims (like the ones above) about how NASA and NOAA had improperly adjusted temperature data to produce their global-average temperature results, I decided to take a crack at the data myself.

I coded up a straightforward baselining/gridding/averaging routine that is quite simple and “dumbed down” in comparison to the NASA and NOAA algorithms. Below is a complete description of the algorithm I coded up.
  1. Using GHCN v3 monthly-average data, compute 1951-1980 monthly baseline temperatures for all GHCN stations. If a station has 15 or more valid temperatures in any given month for the 1951-1980 baseline period, retain that monthly baseline value; otherwise drop that station/month from the computations. Stations with no valid monthly baseline periods are completely excluded from the computations.
  2. For all stations and months where valid baseline temperature estimates were computed per (1) above, subtract the respective baseline temperatures from all of the station monthly temperature temperatures to produce monthly temperature anomalies for the years 1880-2015.
  3. Set up a global gridding scheme to perform area-weighting. To keep things really simple, and to minimize the number of empty grid-cells, I selected large grid-cell sizes (20 degrees x 20 degrees at the Equator). I also opted to recalculate the grid-cell latitude dimensions as one goes north/south of the equator in order to keep the grid-cell areas as nearly constant as possible. I did this to keep the grid-cell areas from shrinking (per the latitude cosines) in order to minimize the number of empty grid cells.
  4. In each grid-cell, compute the average (over all stations in the grid-cell) of the monthly temperature anomalies to produce a single time-series of average temperature anomalies for each month (years 1880 through 2015).
  5. Compute global average monthly temperature anomalies by averaging together all the grid-cell monthly average anomalies, weighted by the grid-cell areas (again, for years 1880 through 2015).
  6. Compute global-average annual anomalies for years 1880 through 2015 by averaging together the global monthly anomalies for each year.
The algorithm does not involve any station data adjustments (obviously!) or temperature interpolation operations. It’s a pretty basic number-crunching procedure that uses straightforward math plus a wee bit of trigonometry (for computing latitude/longitude grid-cell areas).

For me, the most complicated part of the algorithm implementation was managing the variable data record lengths and data gaps (monthly and annual) in the station data -- basically, the “data housekeeping” stuff. Fortunately, modern development libraries such as the C++ Standard Template Library make this less of a chore than it used to be.

Why this routine?

People unfamiliar with global temperature computational methods sometimes ask: “Why not simply average the temperature station data to compute global-average estimates? Why bother with the baselining and gridding described above?”

We could get away with straight averaging of the temperature data if it were not for the two problems described below.

Problem 1: Temperature stations have varying record lengths. The majority of stations do not have continuous data records that go all the way back to 1880 (the beginning of the NASA/GISS global temperature calculations). Even stations with data going back to 1880 have gaps in their records -- there are missing months or even years.

Problem 2: Temperature stations are not evenly distributed over the Earth’s surface. Some regions, like the continental USA and western Europe, have very dense networks of stations. Other regions, like the African continent, have very sparse station networks.

As a result of problem 1, we have a mix of temperature stations that changes from year to year. If we were simply to average the absolute temperature data from all those stations, the final global-average results would be significantly skewed from year to year due to the changing mix of stations from one year to the next.

Fortunately, the solution for this complication is quite straightforward: the baselining and anomaly-averaging procedure described above. For those who already familiar with this procedure, please bear with me while I illustrate how it works with a simple scenario constructed from simulated data.

Let’s consider a very simple scenario where the full 1880-2016 temperature history for a particular region is contained in data reported by two temperature stations, one of which is located on a hilltop and the other located on a nearby valley floor. The hilltop and valley floor locations have identical long-term temperature trends, but the hilltop location is consistently about 1 degree C cooler than the valley floor location. The hilltop temperature station has a temperature record starting in 1880 and ending in 1990. The valley floor station has a temperature record beginning in 1930 and ending in 2016.

Figure 1 below shows the simulated temperature time-series for these two hypothetical stations. Both time-series were constructed by superimposing random noise on the same linear trend, with the valley-floor station time-series having a constant offset temperature 1 degree C more than that of the hilltop station time-series. The simulated time-series for the hilltop station (red) begins in 1880 and continues to 1990. The simulated valley floor station temperature (blue) data begins in 1930 and runs to 2016. As can be seen during their period of overlap (1930-1990), the simulated valley-floor temperature data runs about 1 degree warmer than the simulated hilltop temperature data.

Figure 1: Simulated Hilltop Station Data (red) and Valley Floor Station Data (blue)

If we were to attempt to construct a complete 1880-2016 temperature history for this region by computing a straight average of the hilltop and valley floor data, we would obtain the results seen in Figure 2 below.

Figure 2: Straight Average of Valley Floor Station Data and Hilltop Station Data

The effects of the changing mix of stations (hilltop vs. valley floor) on the average temperature results can clearly be seen in Figure 2. A large temperature jump is seen at 1930, where the warmer valley floor data begins, and a second temperature jump is seen at 1990 where the cooler hilltop data ends. These temperature jumps obviously do not represent actual temperature increases for that particular region; instead, they are artifacts introduced by the changes in the mix of stations in 1930 and 1990.

An accurate reconstruction of the regional temperature history computed from these two temperature time-series obviously should show the warming trend seen in the hilltop and valley floor data over the entire 1880-2016 time period. That is clearly not the case here. Much of the apparent warming seen in Figure 2 is a consequence of the changing mix of stations.

Now, let’s modify the processing a bit by subtracting the (standard NASA/GISS) 1951-1980 hilltop baseline average temperature from the hilltop temperature data and the 1951-1980 valley floor baseline average temperature from the valley floor temperature data. This procedure produces the temperature anomalies for the hilltop and valley floor stations. Then for each year, compute the average of the station anomalies for the 1880-2016 time period.

This is the baselining and anomaly-averaging procedure that is used by NASA/GISS, NOAA, and other organizations to produce their global-average temperature results.

When this baselining and anomaly-averaging procedure is applied to the simulated temperature station data, it produces the results that can be viewed in figure 3 below.

Figure 3: Average of Valley Floor Station Anomalies and Hilltop Station Anomalies

In Figure 3, the temperature jumps associated with the beginning of the valley floor data record and the end of the hilltop data record have been removed, clearly revealing the underlying temperature trend shared by the two temperature time-series.

Also note that although neither of my simulated temperature stations have a full 1880-2016 temperature record, we were still able to compute a complete reconstruction for the 1880-2016 time period because there was enough overlap between the station records to allow us to “align” them via baselining.

The second problem, the non-uniform distribution of temperature stations, can clearly be seen in Figure 4 below. That figure shows all GHCNv3 temperature stations that have data records beginning in 1900 or earlier and continuing to the present time.

Figure 4: Long-Record GHCN Station Distribution

As one can see, the stations are highly concentrated in the continental USA and western Europe; Africa and South America, in contrast, have very sparse coverage. A straight unweighted average of the data from all the stations shown in the above image would result in temperature changes in the continental USA and western Europe “swamping out” temperature changes in South America and Africa in the final global average calculations.

That is the problem that gridding solves. The averaging procedure using grid-cells is performed in two steps. First, the temperature time-series for all stations in each grid-cell are averaged together to produce a single time-series per grid-cell. Then all the grid-cell time-series are averaged together to construct the final global-average temperature results (note: in the final average, the grid-cell time-series are weighted according to the size of each grid-cell). This eliminates the problem where areas on the Earth with very dense networks of stations are over-weighted in the global average relative to areas where the station coverage is more sparse.

Now, some have argued that the sparse coverage of certain regions of the Earth invalidate the global-average temperature computations. But it turns out that the NASA/GISS warming trend can be confirmed even with a very sparse sampling of the Earth’s surface temperatures. (In fact, the NASA/GISS warming trend can be replicated very closely with data from as few as 30 temperature stations scattered around the world.)

Real-world results

Now that we are done with the preliminaries, let’s look at some real-world results. Let’s start off by taking a look at how my simple “dumbed-down” gridding/averaging algorithm compares with the NASA/GISS algorithm when it is used to process the same GHCNv3 adjusted data that NASA/GISS uses. To see how my algorithm compares with the NASA/GISS algorithm, take a look at Figure 5 below, where the output of my algorithm is plotted directly against the NASA/GISS “Global Mean Estimates based on Land Data only” results.

(Note: All references to NASA/GISS global temperature results in this post refer specifically to the NASA/GISS “Global Mean Estimates based on Land Data only” results. Those results can be viewed on the NASA/GISS web-site; scroll down to view the “Global Mean Estimates based on Land Data only” graph).

Figure 5: Adjusted Data, All Stations: My Simple Gridding/Averaging (blue) vs. NASA/GISS (red)

In spite of the rudimentary nature of my algorithm, my algorithm produces results that match the NASA/GISS results quite closely. According to the R-squared statistic I calculated (seen in the upper-left corner of Figure 5), I got 98% of the NASA/GISS answer with a only tiny fraction of the effort!

But what happens when we use unadjusted GHCNv3 data? Well, let’s go ahead and compare the output of my algorithm with the NASA/GISS algorithm when my algorithm is used to process the unadjusted GHCNv3 data. Figure 6 below shows a plot of my unadjusted global temperature results vs. the NASA/GISS results (remember that NASA/GISS uses adjusted GHCNv3 data).

Figure 6: Unadjusted Data, All Stations: My Simple Gridding /Averaging (green) vs. NASA/GISS (red)

My “all stations” unadjusted data results show a warming trend that lines up very closely with the NASA/GISS warming trend from 1960 to 2016, with my results as well as the NASA/GISS results showing record high temperatures for 2016. However, my results do show a visible warm-bias relative to the NASA/GISS results prior to 1950 or so. This is the basis of the accusations that NOAA and NASA “cooled the past (and warmed the present)” to exaggerate the global warming trend.

Now, why do my unadjusted data results show that pre-1950 “warm bias” relative to the NASA/GISS results? Well, this excerpt from NOAA’s GHCN FAQ provides some clues:
Why are there more cold (negative) step changes than warm(positive) step changes in the historical land surface air temperature records represented in the GHCN v3 dataset?

The reason for the larger number of cold step changes is not completely clear, but they may be due in part to systematic changes in station locations from city centers to cooler airport locations that occurred in many parts of the world from the 1930s to through the 1960s.
Because the GHCNv3 metadata contains an airport designator field for every temperature station, it was quite easy for me to modify my program to exclude all the “airport” stations from the computations. So let’s exclude all of the “airport” station data and see what we get. Figure 7 below shows my unadjusted data results vs. the NASA/GISS results when all “airport” stations are excluded from my computations.

Figure 7: Unadjusted Data, Airports Excluded (green) vs. NASA/GISS (red)

There is a very visible reduction in the bias between my unadjusted results and the NASA results (especially prior to 1950 or so) when airport stations are excluded from my unadjusted data processing. This is quite consistent with the notion that many of the stations currently located at airports were moved to their current locations from city centers at some point during their history.

Now just for fun, let’s look at what happens when we do the reverse and exclude non-airport stations (i.e. process only the airport stations). Figure 8 shows what we get when we process unadjusted data exclusively from “airport” stations.

Figure 8: Unadjusted Data, Airports Only (green) vs. NASA/GISS (red)

Well, look at that! The pre-1950 bias between my unadjusted data results and the NASA/GISS results really jumps out. And take note of another interesting thing about the plot -- in spite of the fact that I processed only “airport” stations, the green “airports only” temperature curve goes all the way back to 1880, decades prior to the existence of airplanes (or airports)! It is only reasonable to conclude that those “airport” stations must have been moved at some point in their history.

Now, for a bit more fun, let’s drill down a little further into the data and process only airport stations that also have temperature data records going back to 1903 (the year that the Wright Brothers first successfully flew an airplane) or earlier.

When I drilled down into the data, I found over 400 “airport” temperature stations with data going back to 1903 or earlier. And when I computed global-average temperature estimates from just those stations, this is what I got (Figure 9):

Figure 9: Unadjusted Data, Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

OK, that looks pretty much like the previous temperature plot, except that my results are “noisier” due to the fact that I processed data from fewer temperature stations.

And for even more fun, let’s look at the results we get when we process data exclusively from non-airport stations with data going back to 1903 or earlier:

Figure 10: Unadjusted Data, Non-Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

When only non-airport stations are processed, the pre-1950 “eyeball estimate” bias between my unadjusted data temperature curve and the NASA/GISS temperature curve is sharply reduced.

The results seen in the above plots are entirely consistent with the notion that the movement of large numbers of temperature stations from city centers to cooler outlying airport locations during the middle of the 20th Century is responsible for much of the bias seen between the unadjusted and adjusted GHCNv3 global-average temperature results.

It is quite reasonable to conclude, based on the results presented here, that one major reason for the bias seen between the GHCNv3 unadjusted and adjusted data results is the presence of corrections for those station moves in the adjusted data (corrections that are obviously absent from the unadjusted data). Those corrections remove the contaminating effects of station moves and permit more accurate estimates of global surface temperature increases over time.

Take-home lessons (in no particular order):

  1. Even a very simple global temperature algorithm can reproduce the NASA/GISS results very closely. This really is a case where you can get 98% of the answer (per my R-squared statistic) with less than 1% of the effort.
  2. NOAA’s GHCNv3 monthly data repository contains everything an independent “citizen scientist” needs (data and documentation) to conduct his/her own investigation of the global land station temperature data.
  3. A direct comparison of unadjusted data results (all GHCN stations) vs. the NASA/GISS adjusted data temperature curves reveals only modest differences between the two temperature curves, especially for the past 6 decades. Furthermore, my unadjusted and the NASA/GISS adjusted results show nearly identical (and record) temperatures for 2016. If NASA and NOAA were adjusting data to exaggerate the amount of planetary warming, they sure went to an awful lot of trouble and effort to produce only a small overall increase in warming in the land station data.
  4. Eliminating all “airport” stations from the processing significantly reduced the bias between my unadjusted data results and the NASA/GISS results. It is therefore reasonable to conclude that a large share of the modest bias between my GHCN v3 unadjusted results and the NASA/GISS adjusted data results is the result of corrections for station moves from urban centers to outlying airports (corrections present in the adjusted data, but not in the unadjusted data).
  5. Simply excluding “airport” stations likely eliminates many stations that were always located at airports (and never moved) and also fails to eliminate stations that were moved out from city centers to non-airport locations. So it is not a comprehensive evaluation of the impacts of station moves. However, it is a very easy “first step” analysis exercise to perform; even this incomplete “first step” analysis produces results that strongly consistent with the hypothesis that corrections for station moves are likely the dominant reason for the pre-1950 bias seen between the adjusted and unadjusted GHCN global temperature results. Remember that many urban stations were also moved from city centers to non-airport locations during the mid-20th century. Unfortunately, those station moves are not recorded in the simple summary metadata files supplied with the GHCNv3 monthly data. An analysis of NOAA’s more detailed metadata would be required to identify those stations and perform a more complete analysis of the impacts of station moves. However, that is outside of the scope of this simple project.
  6. For someone who has the requisite math and programming skills, confirming the results presented here should not be very hard at all. Skeptics should try it some time. Provided that those skeptics are willing and able to accept results that contradict their original views about temperature data adjustments, they could have a lot of fun taking on a project like this.

Related reading

Also the Clear Climate Code project was able to reproduce the results of NASA-GISS. Berkeley Earth made an high-level independent analysis and confirmed previous results. Also (non-climate) scientist Nick Stokes (Moyhu) computed his own temperature signal: TempLS which also fits well.

In 2010 Zeke Hausfather analyzed the differences in GHCNv2 between airport and other stations and found only minimal differences: Airports and the land temperature record.

At about the same time David Jones at Clear Climate Code also looked at airport station, just splitting the dataset in two groups, and did found differences: Airport Warming. Thus making sure both groups are regionally comparable is probably important.

The global warming conspiracy would be huge. Not only the 7 global datasets also national datasets from so many groups show clear warming.

Just the facts, homogenization adjustments reduce global warming.

Why raw temperatures show too little global warming.

Irrigation and paint as reasons for a cooling bias.

Temperature trend biases due to urbanization and siting quality changes.

Temperature bias from the village heat island

Cooling moves of urban stations. From cities to airports or simply to outside a city or village.

The transition to automatic weather stations. We’d better study it now. It may be a cooling bias.

Changes in screen design leading to temperature trend biases.

Early global warming

Cranberry picking short-term temperature trends

How climatology treats sceptics

Monday 16 January 2017

Cranberry picking short-term temperature trends

Photo of cranberry fields

Monckton is a heavy user of this disingenuous "technique" and should thus know better: you cannot get any trend, but people like Monckton unfortunately do have much leeway to deceive the population. This post will show that political activists can nearly always pick a politically correct period to get a short-term trend that is smaller than the long-term trend. After this careful selection they can pretend to be shocked that scientists did not tell them about this slowdown in warming.

Traditionally this strategy to pick only the data you like is called "cherry picking". It is such a deplorable deceptive strategy that "cherry picking" sounds too nice to me. I would suggest calling it "cranberry picking". Under the assumption that people only eat cranberries when the burn peeing is worse. Another good new name could be "wishful picking."

In a previous post, I showed that the uncertainty of short-term trends is huge, probably much larger than you think, the uncertainty monster can only stomach a few short-term trends for breakfast. Because of this large uncertainty the influence of cranberry picking is probably also larger than you think. Even I was surprised by the calculations. I hope the uncertainty monster does not upset his stomach, he does not get the uncertainties he needs to thrive.

Uncertainty monster made of papers

Size of short-term temperature fluctuations

To get some realistic numbers we first need to know how large the fluctuations around the long-term trend are. Thus let's first have a look at the size of these fluctuations in two surface temperature and two tropospheric temperature datasets:
  • the surface temperature of Berkeley Earth (formerly known as BEST),
  • the surface temperature of NASA-GISS: GISTEMP,
  • the satellite Temperature of the Total Troposphere (TTT) of Remote Sensing Systems (RSS),
  • the satellite Temperature of the Lower Troposphere (TLT version 6 beta) of the University of Alabama in Huntsville (UAH).
The four graphs below have two panels. The top panel shows the yearly average temperature anomalies over time as red dots. The Berkeley Earth data series starts earlier, but I only use data starting in 1880 because earlier data is too sparse and may thus not show actual climatic changes in the global mean temperature. For both surface temperature datasets the second world war is removed because its values are not reliable. The long-term trend is estimated using a [[LOESS]] smoother and shown as a blue line.

The lower panel shows the deviations from the long-term trend as red dots. The standard deviation of these fluctuations over the full period is written in red. The graphs for the surface temperature also gives the standard deviation of the deviations over the shorter satellite period written in blue for comparison with the satellite data. The period does not make much difference.

Both tropospheric datasets have fluctuations with a typical size (standard deviation) of 0.14 °C. The standard deviation of the surface datasets varies a little depending on the dataset or period. For the rest of this post I will use 0.086 °C as a typical value for the surface temperature.

The tropospheric temperature clearly shows more short-term variability. This mainly comes from El Nino, which has a stronger influence on the temperature high up in the air than on the surface temperature. This larger noise level gives the impression that the trend in the tropospheric temperature is smaller, but the trend in the RSS dataset is actually about the same as the surface trend; see below.

The trend in the preliminary UAHv6 temperature is currently lower than all others. Please note that, the changes from the previous version of UAH to the recent one are large and that the previous version of UAH showed more (recent) warming* and about the same trend as the other datasets.

Uncertainty of short-term trends

Already without cranberry picking short-term trends are problematic because of the strong influence of short-term fluctuations. While a average value computed over 10 years of data is only 3 times as uncertain as a 100-year average, the uncertainty of a 10-year trend is 32 times as large as a 100-year trend.**

To study how accurate a trend is you can generate random numbers and compute their trend. On average this trend will be zero, but due to the short-term fluctuations any individual realization will have some trend. By repeating this procedure often you can study how much the trend varies due to the short-term fluctuations, how uncertain the trend is, or more positively formulated: what the confidence interval of the trend is. See my previous post for details. I have done this for the graph below; for the satellite temperatures the random numbers have a standard deviation of 0.14 °C, for the surface temperatures 0.086 °C.

The graph below shows the confidence interval of the trends, which is two times the standard deviation of 10,000 trends computed from 10,000 series of random numbers. A 10-year trend of the satellite temperatures, which may sound like a decent period, has a whooping uncertainty of 3 °C per century.*** This means that with no long-term trend the short-term trend will vary between -3°C and +3 °C per century for 95% of the cases and for the other 5% even more. That is the uncertainty from the fluctuations along, there are additional uncertainties due to changes in the orbit, the local time the satellite observes, calibration and so on.

Cherry picking the begin year

To look at the influence of cranberry picking, I generated series of 30 values, computed all possible trends between 10 and 30 years and selected the smallest trend. The confidence intervals of these cranberry picked satellite temperature trends are shown below in red. For comparison the intervals for trends without cranberry picking, like above, are shown in blue. To show both cases clearly in the same graph, I have shifted the both bars a little away from each others.

The situation is similar for the surface temperature trends. However, because the data is less noisy, the confidence intervals of the trends are smaller; see below.

While the short-term trends without cranberry picking have a huge uncertainty, on average they are zero. With cranberry picking the average trends are clearly negative, especially for shorter trends, showing the strong influence of selecting a specific period. Without cranberry picking half of the trends are below zero, with cranberry picking 88% of the trends are negative.

Cherry picking the period

For some the record temperatures the last two years are not a sign that they were wrong to see a "hiatus". Some claim that there was something like a "pause" or a "slowdown" since 1998, but that it recently stopped. This claim gives even more freedom for cranberry picking. Now also the end year is cranberry picked. To see how bad this is, I again generated noise and selected the period lasting at least 10 years with the lowest trend and ending this year, or one year earlier or two years earlier.

The graphs below compare the range of trends you can get with cranberry picking the begin and end year in green with "only" cranberry picking the begin year like before in red. With double cranberry picking 96% of the trends are negative and the trends are going down even more. (Mitigation skeptics often use this "technique" by showing an older plot, when the newer plot would not be as "effective".)

A negative trend in the above examples of random numbers without any trend would be comparable to a real dataset where a short-term trend is below the long-term trend. Thus by selecting the "right" period, political activists can nearly always claim that scientists talking about the long-term trend are exaggerating because they do not look at this highly interesting short period.

In the US political practice the cranberry picking will be worse. Activists will not only pick a period of their political liking, but also the dataset, variable, region, depth, season, or resolution that produces a graph that can be misinterpreted. The more degrees of freedom, the stronger the influence of cranberry picking.


There are a few things you can do to protect yourself against making spurious judgements.

1. Use large datasets. You can see in the plots above that the influence of cranberry picking is much smaller for the longer trends. For a 30-year period the difference between the blue confidence intervals for a typical 30-year period and the red confidence intervals for a cranberry picked 30-year period is small. Had I generated series of 50 random numbers rather than 30 numbers, this would likely have shown a larger effect of cranberry picking on 30-year trends, but still a lot smaller than on 10-year trends.

2. Only make statistical tests for relationships you expect to exist. This limits your freedom and the chance that one of the many possible statistical tests is spuriously significant. If you make 100 statistical tests of pure noise, 5 of them will on average be spuriously significant.

There was no physical reason for global warming to stop or slow down after 1998. No one computed the trend since 1998 because they had a reason to expect a change. They computed it because their eyes had seen something; that makes the trend test cranberry picking by definition. The absence of a reason should have made people very careful. The more so because there was a good reason to expect spurious results starting in a large El Nino year.

3. Study the reasons for the relationship you found. Even if I would wrongly have seen the statistical evidence for a trend decrease as credible, I would not have made a big point of it before I had understood the reason for this trend change. In the "hiatus" case the situation was even reversed: it was clear from the beginning that most of fluctuations that gave the appearance of a "hiatus" in the eyes of some was El Nino. Thus there was a perfectly fine physical reason not to claim that there was a change in the trend.

There is currently a strong decline in global sea ice extent. Before I cry wolf, accuse scientists of fraud and understating the seriousness of climate change, I would like to understand why this decline happened.

4. Use the right statistical test. People have compared the trend before 1998 and after 1998 and their uncertainties. These trend uncertainties are not valid for cherry picked periods. In this case, the right test would have been one for a trend change at an unknown position/year. There was no physical reason to expect a real trend change in 1998, thus the statistical test should take that the actual reason you make the test is because your eye sampled all possible years.

Against activists doing these kind of things we cannot do much, except trying to inform their readers how deceptive this strategy is. For example by linking to this post. Hint, hint.

Let me leave you with a classic Potholer54 video delicately mocking Monckton's cranberry picking to get politically convenient global cooling and melting ice trends.

Related reading

Richard Telford on the Monckton/McKitrick definition of a "hiatus", which nearly always gives you one: Recipe for a hiatus

Tamino: Cherry p

Statistically significant trends - Short-term temperature trend are more uncertain than you probably think

How can the pause be both ‘false’ and caused by something?

Atmospheric warming hiatus: The peculiar debate about the 2% of the 2%

Temperature trend over last 15 years is twice as large as previously thought because much warming was over Arctic where we have few measurements

Why raw temperatures show too little global warming

* The common baseline period of UAH5.6 and UAH6.0 is 1981-2010.

** These uncertainties are for Gaussian white noise.

*** I like the unit °C per century for trends even if the period of the trend it shorter. You get rounder numbers and it is easier to compare the trends to the warming we have seen in the last century and expert to see in the next one.

**** The code to compute the graphs of this post can be downloaded here.

***** Photo of cranberry field by mrbanjo1138 used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.

Sunday 8 January 2017

Much ado about NOAAthing

I know NOAAthing.

This post is about nothing. Nearly nothing. But when I found this title I had to write it.

Once upon a time in America there were some political activists who claimed that global warming had stopped. These were the moderate voices, with many people in this movement saying that an ice age is just around the corner. Others said global warming paused, hiatused or slowed down. I feel that good statistics has always shown this idea to be complete rubbish (Foster and Abraham, 2015; Lewandowsky et al., 2016), but at least in 2017 it should be clear that it is nothing, nothing what so ever. It is interpreting noise. More kindly: interpreting variability, mostly El Nino variability.

Even if you disingenuously cherry-pick 1998 the hot El Nino year as the first year of your trend to get a smaller trend, the short-term trend is about the same size as the long-term trend now that 2016 is another hot El Nino year to balance out the first crime. Zeke Hausfather tweeted to the graph below: "You keep using that word, "pause". I do not think it means what you think it means." #CulturalReference

In 2013 Boyin Huang of NOAA and his colleagues created an improved sea surface dataset called ERSST.v4. No one cared about this new analysis. Normal good science.

Thomas Karl of NOAA and his colleagues showed what the update means for the global temperature (ocean and land). The interesting part is the lower panel. It shows that the adjustments make global warming smaller by about 0.2°C. Climate data scientists naturally knew this and I blogged about his before, but I think the Karl paper was the first time this was shown in the scientific literature. (The adjustments are normally shown for the individual land or ocean datasets.)

But this post is unfortunately about nearly nothing, about the minimal changes in the top panel of the graph below. I made the graph extra large, so that you can see the differences. The thick black line shows the new assessment (ERSST.v4) and the thin red line the previous estimated global temperature signal (ERSST.v3). Differences are mostly less than 0.05°C, both warmer and cooler. The "problem" is the minute change at the right end of the curves.

The new paper by Zeke Hausfather and colleagues now shows evidence that the updated dataset (ERSSTv4) is indeed better than the previous version (ERSSTv3b). It is a beautifully done study of high technical quality. They do so by comparing the ERSST dataset, which comes from a large number of data sources, with  data that comes only from only one source (buoys, satellites (CCl) or ARGO). These single-source datasets are shorter, but without trend uncertainties due to the combination of sources.

The recent trend of HadSST also seems to be too small and to a lesser amount also COBE-SST. This problem with HadSST was known, but not published yet. The warm bias of ships that measure SST at their engine room intake is getting smaller over the last decade. The reason for this is not yet clear. The main contender seems to be that the fleet has become more actively managed and (typically warm) bad measurements have been discontinued.

Also ERSST uses ship data, but it gives them a much smaller weight compared to the buoy data. That makes this problem less visible in ERSST. Prepare for a small warming update for recent temperatures once this problem is better understood and corrected for. And prepare for the predictable cries of the mitigation skeptical movement and their political puppets.

Karl and colleagues showed that as a consequence of the minimal changes in ERSST and if you start a trend in 1998 and compute a trend, this trend is statistically significant. In the graph below you can see in the left global panel that the old version of ERSST (circles) had a 90% confidence interval (vertical line) that includes zero (not statistically significantly different from zero), while the confidence interval of updated dataset did not (statistically significant).

Did I mention that such a cherry-picked begin year is a very bad idea? The right statistical test is one for a trend change at an unknown year. This test provides no evidence whatsoever for a recent trend change.

That the trend in Karl and colleagues was statistically significant should thus not have mattered: Nothing could be worse than define a "hiatus" period as one were the confidence interval of a trend includes zero. However, this is the definition public speaker Christopher Monckton uses for his blog posts at Watts Up With That, a large blog of the mitigation skeptical movement. Short-term trends are very uncertain, their uncertainty increases very fast the shorter the period is. Thus if your period is short enough, you will find a trend whose confidence interval includes zero.

You should not do this kind of statistical test in the first place because of the inevitable cherry picking of the period, but if you want to statistically test whether the long-term trend suddenly dropped, the test should have the long-term trend as null-hypothesis. This is the 21st century, we understand the physics of man-made global warming, we know it should be warming, it would be enormously surprising and without any explanation if "global warming had stopped". Thus continued warming is the thing that should be disproven, not a flat trend line. Good luck doing so for such short periods given how enormously uncertain short-term trends are.

The large uncertainty also means that cherry picking a specific period to get a low trend has a large impact. I will show this numerically in an upcoming post. The methods to compute a confidence interval are for a randomly selected period, not for a period that was selected to have a low trend.

Concluding, we have something that does not exist, but which was made into an major talking point of the mitigation skeptical movement. This movement put their credibility on fluctuations that produced a minor short-term trend change that was not statistically significant. The deviation was also so small that it put an unfounded confidence in the perfection of the data.

The inevitable happened and small corrections needed to be made to the data. After this even disingenuous cherry-picking and bad statistics were no longer enough to support the talking point. As a consequence Lamar Smith of TX21 abused his Washington power to punish politically inconvenient science. Science that was confirmed this week. This should all have been politically irrelevant because the statistics were wrong all along. This was politically irrelevant by now because the new El Nino produced record temperatures in 2016 and even cherry picking 1998 as begin year is no longer enough.

"Much Ado About Nothing is generally considered one of Shakespeare's best comedies because it combines elements of mistaken identities, love, robust hilarity with more serious meditations on honour, shame, and court politics."
Yes, I get my culture from Wikipedia)

To end on a positive note, if your are interested in sea surface temperature and its uncertainties, we just published a review paper in the Bulletin of the American Meteorological Society: "A call for new approaches to quantifying biases in observations of sea-surface temperature." This focuses on ideas for future research and how the SST community can make it easier for others to join the field and work on improving the data.

Another good review paper on the quality of SST observations is: "Effects of instrumentation changes on sea surface temperature measured in situ" and also the homepage of HadSST is quite informative. For more information on the three main sea surface temperature datasets follow these links: ERSSTv4, HadSST3 and COBE-SST. Thanks to John Kennedy for suggesting the links in this paragraph.

Do watch the clear video below where Zeke Hausfather explains the study and why he thinks recent ocean warming used to be underestimated.

Related reading

The op-ed by the authors Kevin Cowtan and Zeke Hausfather is probably the best article on the study: Political Investigation Is Not the Way to Scientific Truth. Independent replication is the key to verification; trolling through scientists' emails looking for out-of-context "gotcha" statements isn't.

Scott K. Johnson in Ars Technica (a reading recommendation for science geeks by itself): New analysis shows Lamar Smith’s accusations on climate data are wrong. It wasn't a political plot—temperatures really did get warmer.

Phil Plait (Bad Astronomy) naturally has a clear explanation of the study and the ensuing political harassment: New Study Confirms Sea Surface Temperatures Are Warming Faster Than Previously Thought

The take of the UK MetOffice, producers of HadSST, on the new study and the differences found for HadSST: The challenge of taking the temperature of the world’s oceans

Hotwhopper is your explainer if you like your stories with a little snark: The winner is NOAA - for global sea surface temperature

Hotwhopper follow-up: Dumb as: Anthony Watts complains Hausfather17 authors didn't use FUTURE data. With such a response to the study it is unreasonable to complain about snark in the response.

The Christian Science Monitor gives a good non-technical summary: Debunking the myth of climate change 'hiatus': Where did it come from?

I guess it is hard for a journalist to not write that the topic is not important. Chris Mooney at the Washington Post claims Karl and colleagues is important: NOAA challenged the global warming ‘pause.’ Now new research says the agency was right.

Climate Denial Crock of the Week with Peter Sinclair: New Study Shows (Again): Deniers Wrong, NOAA Scientists Right. Quotes from several articles and has good explainer videos.

Global Warming ‘Hiatus’ Wasn’t, Second Study Confirms

The guardian blog by John Abraham: New study confirms NOAA finding of faster global warming

Atmospheric warming hiatus: The peculiar debate about the 2% of the 2%

No! Ah! Part II. The return of the uncertainty monster

How can the pause be both ‘false’ and caused by something?


Grant Foster and John Abraham, 2015: Lack of evidence for a slowdown in global temperature. US CLIVAR Variations, Summer 2015, 13, No. 3.

Zeke Hausfather, Kevin Cowtan, David C. Clarke, Peter Jacobs, Mark Richardson, Robert Rohde, 2017: Assessing recent warming using instrumentally homogeneous sea surface temperature records. Science Advances, 04 Jan 2017.

Boyin Huang, Viva F. Banzon, Eric Freeman, Jay Lawrimore, Wei Liu, Thomas C. Peterson, Thomas M. Smith, Peter W. Thorne, Scott D. Woodruff, and Huai-Min Zhang, 2015: Extended Reconstructed Sea Surface Temperature Version 4 (ERSST.v4). Part I: Upgrades and Intercomparisons. Journal Climate, 28, pp. 911–930, doi: 10.1175/JCLI-D-14-00006.1.

Thomas R. Karl, Anthony Arguez, Boyin Huang, Jay H. Lawrimore, James R. McMahon, Matthew J. Menne, Thomas C. Peterson, Russell S. Vose, Huai-Min Zhang, 2015: Possible artifacts of data biases in the recent global surface warming hiatus. Science. doi: 10.1126/science.aaa5632.

Lewandowsky, S., J. Risbey, and N. Oreskes, 2016: The “Pause” in Global Warming: Turning a Routine Fluctuation into a Problem for Science. Bull. Amer. Meteor. Soc., 97, 723–733, doi: 10.1175/BAMS-D-14-00106.1.

Tuesday 3 January 2017

Budapest Seminar on Homogenization and Quality Control

3 – 7 April 2017
Organized by the Hungarian Meteorological Service (OMSZ
Supported by WMO

Photo Budapest parliament

The first eight Seminars for Homogenization and Quality Control in Climatological Databases as well as the first three Conferences on Spatial Interpolation Techniques in Climatology and Meteorology were held in Budapest and hosted by the Hungarian Meteorological Service.

The 7th Seminar in 2011 was organized together with the final meeting of the COST Action ES0601: Advances, in Homogenization Methods of Climate Series: an integrated approach (HOME), while the first Conference on Spatial Interpolation was organized in 2004 in the frame of the COST Action 719: The Use of Geographic Information Systems in Climatology and Meteorology. Both series were supported by WMO.

In 2014 the 8th Homogenization Seminar and the 3rd Interpolation Conference were organized together considering certain theoretical and practical aspects. Theoretically there is a strong connection between these topics since the homogenization and quality control procedures need spatial statistics and interpolation techniques for spatial comparison of data. On the other hand the spatial interpolation procedures (e.g. gridding) require homogeneous, high quality data series to obtain good results.

The WMO CCl set up team to support quality control and homogenization activities at NMHSs. The main task of the Task Team on Homogenisation (TT HOM) to provide guidance to Members on methodologies, standards and software required for quality control and homogenization of long term climate time-series. The results of the homogenization sessions can improve the content of the guidance is under preparation.

Marx and Engels at the museum for communist area statues in Szobor park.

Communist area statues in Szobor park.

Thermal bath.

To go to the Hungarian weather service, you probably need to take a tram or metro to [[Széll Kálmán tér]] (Széll Kálmán Square).

The post office at Széll Kálmán Square.

"Szent Gellért tér" station of Budapest Metro.
Highlights and Call for Papers
The homogeneous data series with high quality and the spatial interpolation are indispensable for the climatological and meteorological examinations. The primary goal of this meeting is to promote the discussion about the methodological and theoretical aspects.

The main topics of homogenization and quality control are intended to be the following:
  • Methods for homogenization and quality control of monthly data series
  • Spatial comparison of series, inhomogeneity detection, correction of series
  • Methods for homogenization and quality control of daily data series, examination of parallel measurements
  • Relation of monthly and daily homogenization, mathematical formulation of homogenization for climate data series generally
  • Theoretical evaluation and benchmark for methods, validation statistics
  • Applications of different homogenization and quality control methods, experiences with different meteorological variables

The main topics of spatial interpolation are the following:
  • Temporal scales: from synoptic situations to climatological mean values
  • Interpolation formulas and loss functions depending on the spatial probability distribution of climate variables
  • Estimation and modelling of statistical parameters (e.g.: spatial trend, covariance or variogram) for interpolation formulas using spatiotemporal sample and auxiliary model variables (topography)
  • Use of auxiliary co-variables, background information (e.g.: dynamical model results, satellite, radar data) for spatial interpolation (data assimilation, reanalysis)
  • Applications of different interpolation methods for the meteorological and climatological fields
  • Gridded databases, digital climate atlases, results of the DanubeClim project

Organizational Details
Persons intending to participate on the meeting are required to pre-register by filling the form enclosed. To have a presentation please send us also a short abstract (max. 1 page). The pre-registration and abstract submission deadline is 20 February 2017. Publication of the papers in proceedings in the serial WMO/WCP/WCDMP is foreseen after the meeting. Paper format information will be provided in the second circular. The registration fee (including book of abstracts, coffee breaks, social event, proceedings) is 120 EUR. The second circular letter with accommodation information will be sent to the pre-registered people by 28 February 2017.

Location and Dates
The meeting will be held 3-7 April 2017 in Budapest, Hungary at the Headquarter of Hungarian Meteorological Service (1. Kitaibel P. Street, Budapest, 1024).

The official language of the meeting is English.

For further information, please contact:
Hungarian Meteorological Service
P.O.Box 38, Budapest, H-1525, Hungary

The pre-registration form can be downloaded here.

* Photo Budapest Parliament by Never House used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.
* Budapest-Szobor park by Simon used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.
* Budapest-Exterior thermal baths by Simon used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.

* Szobor park by bjornsphoto used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.
* Photo "Szent Gellért tér" station of Budapest Metro by Marco Fieber used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.