Friday 1 May 2020

Statistical homogenization under-corrects any station network-wide trend biases

Photo of a station of the US Climate Reference Network with a prominent wind shield for the rain gauges.
A station of the US Climate Reference Network.

In the last blog post I made the argument that the statistical detection of breaks in climate station data has problems when the noise is larger than the break signal. The post before argued that the best homogenization correction method we have can remove network-wide trend biases perfectly if all breaks are known. In the light of the last post, we naturally would like to know how well this correction method can remove such biases in the more realistic case when the breaks are imperfectly estimated. That should still be studied much better, but it is interesting to discuss a number of other studies on the removal of network-wide trend biases from the perspective of this new understanding.

So this post will argue that it theoretically makes sense that (unavoidable) inaccuracies of break detection lead to network-wide trend biases only being partially corrected by statistical homogenization.

1) We have seen this in our study of the correction method in response to small errors in the break positions (Lindau and Venema, 2018).

2) The benchmarking study of NOAA’s homogenization algorithm shows that if the breaks are big and easy they are largely removed, while in the scenario where breaks are plentiful and small half of the trend bias remains (Williams et al., 2012).

3) Another benchmarking study show that with the network density of Switzerland homogenization can find and remove clear trend biases, while if you thin this network to be similar to Peru the bias cannot be removed (Gubler et al., 2017).

4) Finally, a benchmarking study of relative humidity station observations in Austria could not remove much of the trend bias, which is likely because relative humidity is not correlated well from station to station (Chimani et al., 2018).

Statistical homogenization on a global scale makes warming estimates larger (Lawrimore et al., 2011; Menne et al., 2018). Thus if it can only remove part of any trend bias, this would mean that quite likely the actual warming was larger.

Figure 1: The inserted versus remaining network-mean trend error. Upper panel for perfect breaks. Lower panel for a small perturbation of the break position. The time series are 100 annual values and have 5 break. Figure 10 in Lindau and Venema (2018).

Joint correction method

First, what did our study on the correction method (Lindau and Venema, 2018) say about the importance of errors in the break position? As the paper was mostly about perfect breaks, we assumed that all breaks were known, but that they had a small error in their position. In the example to the right, we perturbed the break position by a normally distributed random number with standard deviation one (lower panel), while for comparison the breaks are perfect (upper panel).

In both cases we inserted a large network-wide trend bias of 0.873 °C over the length of the century long time series. The inserted errors for 1000 simulations is on the x-axis, the average inserted trend bias is denoted by x̅. The remaining error after homogenization is on the y-axis. Its average is denoted by y̅ and basically zero in case the breaks are perfect (top panel). In case of the small perturbation (lower panel) the average remaining error is 0.093 °C, this is 11 % of the inserted trend bias. That is the under-correction for is a quite small perturbation: 38 % of the positions is not changed at all.

If the standard deviation of the position perturbation is increased to 2, the remaining trend bias is 21 % of the inserted bias.

In the upper panel, there is basically no correlation between the inserted and the remaining error. That is, the remaining error does not depend on the break signal, but only on the noise. In the lower panel with the position errors, there is a correlation between the inserted and remaining trend error. So in this more realistic case, it does matter how large the trend bias due to the inhomogeneities is.

This is naturally an idealized case, position errors will be more complicated in reality and there would be spurious and missing breaks. But this idealized case fitted best to the aim of the paper of studying the correction algorithm in isolation.

It helps understand where the problem lies. The correction algorithm is basically a regression that aims to explain the inserted break signal (and the regional climate signal). Errors in the predictors will lead to an explained variance that is less than 100 %. One should thus expect that the estimated break signal is smaller than the actual break signal. It is thus expected that the trend change due to the estimated break signal produces is smaller than the actual trend change due to the inhomogeneities.

NOAA’s benchmark

That statistical homogenization under-corrects when the going gets tough is also found by the benchmarking study of NOAA’s Pairwise Homogenization Algorithm in Williams et al. (2012). They simulated temperature networks like the American USHCN network and added inhomogeneities according to a range of scenarios. (Also with various climate change signals.) Some scenarios were relatively easy, had few and large breaks, while others were hard and contained many small breaks. The easy cases were corrected nearly perfectly with respect to the network-wide trend, while in the hard cases only half of the inserted network-wide trend error was removed.

The results of this benchmarking for the three scenarios with a network-wide trend bias are shown below. The three panels are for the three scenarios. Each panel has results (the crosses, ignore the box plots) for three periods over which the trend error was computed. The main message is that the homogenized data (orange crosses) lies between the inhomogeneous data (red crosses) and the homogeneous data (green crosses). Put differently, green is how much the climate actually changed, red is how much the estimate is wrong due to inhomogeneities, orange shows that homogenization moves the estimate towards the truth, but never fully gets there.

If we use the number of breaks and their average size as a proxy for the difficulty of the scenario, the one on the left has 6.4 breaks with an average size of 0.8 °C, the one in the middle 8.4 breaks (size 0.4 °C) and the one on the right 10 breaks (size 0.4 °C). So this suggests there is a clear dose effect relationship; although there surely is more than just the number of breaks.

Figures from Williams et al. (2012) showing the results for three scenarios. This is a figure I created from parts of Figure 7 (left), Figure 5 (middle) and Figure 10 (right; their numbers).

When this study appeared in 2012, I found the scenario with the many small breaks much too pessimistic. However, our recent study estimating the properties of the inhomogeneities of the American network found a surprisingly large number of breaks: more than 17 per century; they were bigger: 0.5 °C. So purely based on the number of breaks the hardest scenario is even optimistic, but also size matters.

Not that I would already like to claim that even in a dense network like the American there is a large remaining trend bias and the actual warming was much larger. There is more to the difficulty of inhomogeneities than their number and size. It sure is worth studying.

Alpine benchmarks

The other two examples in the literature I know of are examples of under-correction in the sense of basically no correction because the problem is simply too hard. Gubler et al. (2017) shows that the raw data of the Swiss temperature network has a clear trend bias, which can be corrected with homogenization of its dense network (together with metadata), but when they thin the network to a network density similar to that of Peru, they are unable to correct this trend bias. For more details see my review of this article in the Grassroots Review Journal on Homogenization.

Finally, Chimani et al. (2018) study the homogenization of daily relative humidity observations in Austria. I made a beautiful daily benchmark dataset, it was a lot of fun: on a daily scale you have autocorrelations and a distribution with an upper and lower limit, which need to be respected by the homogeneous data and the inhomogeneous data. But already the normal homogenization of monthly averages was much too hard.

Austria has quite a dense network, but relative humidity is much influenced by very local circumstances and does not correlate well from station to station. My co-authors of the Austrian weather service wanted to write about the improvements: "an improvement of the data by homogenization was non‐ideal for all methods used". For me the interesting finding was: nearly no improvement was possible. That was unexpected. Had we expected that we could have generated a much simpler monthly or annual benchmark to show no real improvement was possible for humidity data and saved us a lot of (fun) work.

What does this mean for global warming estimates?

When statistical homogenization only partially removes large-scale trend biases what does this mean for global warming estimates? In the global temperature datasets statistical homogenization leads to larger warming estimates. So if we tend to underestimate how much correction is needed, this would mean that the Earth most likely warmed up more than current estimates indicate. How much exactly is hard to tell at the moment and thus needs a nuanced discussion. Let me give you my considerations in the next post.

Other posts in this series

Part 5: Statistical homogenization under-corrects any station network-wide trend biases

Part 4: Break detection is deceptive when the noise is larger than the break signal

Part 3: Correcting inhomogeneities when all breaks are perfectly known

Part 2: Trend errors in raw temperature station data due to inhomogeneities

Part 1: Estimating the statistical properties of inhomogeneities without homogenization


Chimani Barbara, Victor Venema, Annermarie Lexer, Konrad Andre, Ingeborg Auer and Johanna Nemec, 2018: Inter-comparison of methods to homogenize daily relative humidity. International Journal Climatology, 38, pp. 3106–3122.

Gubler, Stefanie, Stefan Hunziker, Michael Begert, Mischa Croci-Maspoli, Thomas Konzelmann, Stefan Brönnimann, Cornelia Schwierz, Clara Oria and Gabriela Rosas, 2017: The influence of station density on climate data homogenization. International Journal of Climatology, 37, pp. 4670–4683.

Lawrimore, Jay H., Matthew J. Menne, Byron E. Gleason, Claude N. Williams, David B. Wuertz, Russell S. Vose and Jared Rennie, 2011: An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. Journal Geophysical Research, 116, D19121.

Lindau, Ralf and Victor Venema, 2018: On the reduction of trend errors by the ANOVA joint correction scheme used in homogenization of climate station records. International Journal of Climatology, 38, pp. 5255– 5271. Manuscript:, paywalled article:

Menne, Matthew J., Claude N. Williams, Byron E. Gleason, Jared J. Rennie and Jay H. Lawrimore, 2018: The Global Historical Climatology Network Monthly Temperature Dataset, Version 4. Journal of Climate, 31, 9835–9854.

Williams, Claude, Matthew Menne and Peter Thorne, 2012: Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. Journal Geophysical Research, 117, D05116.

No comments:

Post a Comment

Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.