Sunday 1 May 2016

Christy and McNider: Time Series Construction of Summer Surface Temperatures for Alabama

John Christy and Richard McNider have a new paper in the AMS Journal of Applied Meteorology and Climatology called "Time Series Construction of Summer Surface Temperatures for Alabama, 1883–2014, and Comparisons with Tropospheric Temperature and Climate Model Simulations". Link: Christy and McNider (2016).

This post gives just few quick notes on the methodological aspects of the paper.
1. They select data with a weak climatic temperature trend.
2. They select data with a large cooling bias due to improvements in radiation protection of thermometers.
3. They developed a new homogenization method using an outdated design and did not test it.

Weak climatic trend

Christy and McNider wrote: "This is important because the tropospheric layer represents a region where responses to forcing (i.e., enhanced greenhouse concentrations) should be most easily detected relative to the natural background."

The trend in the troposphere should a few percent stronger than at the surface; mainly in the tropics. However, it is mainly interesting that they see a strong trend as a reason to prefer tropospheric temperatures, because when it comes to the surface they select the period and temperature with the smallest temperature trend: the daily maximum temperatures in summer.

The trend in winter due to global warming should be 1.5 times the trend in summer and the trend in the night time minimum temperatures is stronger than the trend in the day time maximum temperatures, as discussed here. Thus Christy and McNider select the data with the smallest trend for the surface. Using their reasoning for the tropospheric temperatures they should prefer night time winter temperatures.

(And their claim on the tropospheric temperatures is not right because whether a trend can be detected does not only depend on the signal, but also on the noise. The weather noise due to El Nino is much stronger in the troposphere and the instrumental uncertainties are also much larger. Thus the signal to noise ratio is smaller for the tropospheric temperatures, even if the signal were as long as the surface observations.

Furthermore, I am somewhat amused that there are still people interested in the question whether global warming can be detected.)

[UPDATE. Tamino shows that within the USA, Alabama happens to be the region with the least warming. The more so for the maximum temperature. The more so for the summer temperature.]

Cooling bias

Then they used data with a very large cooling bias due to improvements in the protection of the thermometer for (solar and infra-red) radiation. Early thermometers were not protected as well against solar radiation and typically record too high temperatures. Early thermometers also recorded too cool minimum temperatures; the thermometer should not see the cold sky, otherwise it radiates out to it and cools. The warming bias in the maximum temperature is larger than the cooling bias in the minimum temperature, thus the mean temperature still has some bias, but less than the maximum temperature.

Due to this reduction in the radiation error summer temperatures have a stronger cooling bias than winter temperatures.

The warming effect of early measurements on the annual means is probably about 0.2 to 0.3°C. In the maximum temperature is will be a lot higher and in the summer temperature it will again be a lot higher.

That is why most climatologists use the annual means. Homogenization can improve climate data, but it cannot remove all biases. Thus it is good to start with data that has least bias. Much better than starting with a highly biased dataset like Christy and McNider did.

Statistical homogenization removes biases by comparing a candidate station to its neighbour. The stations need to be close enough together so that the regional climate can be assumed to be similar in both stations. The difference between two stations is then weather noise and inhomogeneities (non-climatic changes due to changes in the way temperature was measured).

If you want to be able to see the inhomogeneities you thus need to have well correlated neighbors that have as little weather noise as possible. By using only the maximum temperature, rather than the mean temperature, you increase the weather noise. But using the monthly means in summer, rather than the annual means or at the very least the summer means, you increase the weather noise. By going back in time more than a century you increase the noise because we had less stations to compare with at the time.

They keyed part of the the data themselves mainly for the period before 1900 from the paper records. It sounds as if they performed no quality control of these values (to detect measurement errors). This will also increase the noise.

With such a low signal to noise ratio (inhomogeneities that are small relative to the weather noise in the difference time series), the estimated date of the breaks they still found will have a large uncertainty. It is thus a pity that they purposefully did not use information from station histories (metadata) to get the date of the breaks right.

Homogenization method

They developed their own homogenization method and only tested it on a noise signal with one break in the middle. Real series have multiple breaks; in the USA typically every 15 years. Furthermore also the reference series has breaks.

The method uses the detection equation from the Standard Normal Homogeneity Test (SNHT), but then starts using different significance levels. Furthermore for some reason it does not use the hierarchical splitting of SNHT to deal with multiple breaks, but it detects on a window, in which it is assumed there is only one break. However, if you select the window too long it will contain more than one break and if you select the window too short the method will have no detection power. You would thus theoretically expect the use of a window for detection to perform very badly and this is also what we found in a numerical validation study.

I see no real excuse not to use better homogenization methods (ACMANT, PRODIGE, HOMER, MASH, Craddock). These are build to take into account that also the reference station has breaks and that a series will have multiple breaks; no need for ad-hoc windows.

If you design your own homogenization method, it is good scientific practice to test it first, to study whether it does what you hope it does. There is, for example, the validation dataset of the COST Action HOME. Using that immediately allows you to compare your skill to the other methods. Given the outdated design principles, I am not hopeful the Christy and McNider homogenization method would score above average.


These are my first impressions on the homogenization method used. Unfortunately I do not have the time at the moment to comment on the non-methodological parts of the paper.

If there are no knowledgeable reviewers available in the USA, it would be nice if the AMS would ask European researchers, rather than some old professor who in the 1960s once removed an inhomogeneity from his dataset. Homogenization is a specialization, it is not trivial to make data better and it really would not hurt if the AMS would ask for expertise from Europe when American experts are busy.

Hitler is gone. The EGU general assembly has a session on homogenization, the AGU does not. The EMS has a session on homogenization, the AMS does not. EUMETNET organizes data management workshops, a large part of which is about homogenization; I do not know of an American equivalent. And we naturally have the Budapest seminars on homogenization and quality control. Not Budapest, Georgia, nor Budapest, Missouri, but Budapest, Hungary, Europe.

Related reading

Tamino: Cooling America. Alabama compared to the rest of contiguous USA.

HotWhopper discusses further aspects of this paper and some differences between the paper and the press release. Why nights can warm faster than days - Christy & McNider vs Davy 2016

Early global warming

Statistical homogenisation for dummies


  1. Thank you for this explanation. It's very helpful to be able to see what the paper was doing in the context of other (better) approaches. (I also couldn't figure some of the more stretched conclusions of the paper/press release from what was presented as research.)

  2. Thanks, Victor. Very helpful, and quite a timesaver to see flaws in the C & M paper, including some subtle ones, pointed out for non-experts. The references look quite useful too.

  3. McNider has form. That doesn't invalidate his present work of itself, but is worth noting.

    * (he's one of the GWPFs tame "inquirers")

  4. Tamino has put something up that is, uhm, quite interesting, too:

    Let's just say that they may claim they used Alabama because, well, that's where UAH is, but it also is quite an anomaly in the US as a whole.

  5. If one was to speculate with an especially cynical turn of mind it might be suspected that Christy & McNider had intended to create a paper that had a 'robust' reconstruction of temperatures that could be claimed to be a good ground based comparison for LTT derived from satellite measurements. Then the divergence of UAH v6.27481... from surface data would not look so bad.

    Picking the locality, daily max and seasonal data to minimise any measurable trend is the obvious tactic if you want the surface reconstruction to be anything like comparable to the satellite reconstruction of LTT.

    But such blatant fixing of the answer was probably just TOO obvious for any reputable journal or reviewers to overlook. So the paper was re-configured as a 'test' of the statistical method for finding breakpoints or possible discontinuities in past surface temperature data from individual stations without any reference to station metadata that would confirm or refute a site, equipment or TOBs change.

    It was also a 'test' of the method without any comparisons with other results or methods on the same data. Hard to asses the credibility of a methodological tool without having either the gold standard or other established methodologies to make that comparison with.

    So this is probably a paper that started out as a way of constructing a surface temperature data-set that would minimally contradict the UAH satellite derived LTT. That is hinted at in the paper by the discussion of how summer maximums in Alabama are more 'representative' of lower tropospheric temperatures. (actually more representative as an outlier in low trends)

    Prediction; as the divergence between UAH and surface temperature records grows this paper will be claimed to show that lower troposphere trends derived from (selected parts of) the surface record ARE consistent with LTT datasets derived from satellite measurements.

  6. Not only worse than you think, worse than you could possibly think. The "warming hole" that they found is well known, commented on and investigated in a series of papers extending back to the WG1 TAR.


Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.