Statistical homogenisation for dummies

The self-proclaimed climate sceptics keep on spreading fairy tales that homogenisation is smoothing climate data and leads to adjustments of good stations to make them into bad stations. Quite some controversy for such an innocent method to reduce non-climatic influences from the climate record.

In this post, I will explain how homogenisation really works using a simple example with only three stations. Figure 1 shows these three nearby stations. Statistical homogenisation exploits the fact that these three time series are very similar (are highly correlated) as they measure almost the same regional climate. Changes that happen at only one of the stations are assumed to be non-climatic. The aim of homogenisation is to remove such non-climatic changes in the data.

Figure 1. The annual mean temperature data of three hypothetical stations in one climate region.

(In case colleagues of mine are reading this and are wondering about my craftsmanship: I do know who to operate scientific plotting software, but some “sceptics” make fun of people who have no experience with Excel. I just wanted to show off with being able to use a spreadsheet.)

For the example, I have added a break inhomogeneity in the middle with a typical size of 0.8 °C (1.5 °F) to the data for station A; see Figure 2.

Figure 2. The temperature of Station A, with a break of 0.8°C in 1940.

In Figure 2 the break can be seen, but it could also be a trend, decadal variability or some climatic mode. You can already see the break better by plotting all tree stations as in Figure 3.

Figure 3. The temperature of all three stations. Station A has a break in 1940.

If you compute the difference time series of all pairs of stations, you see the break more clearly; see Figure 4.

Figure 4. The difference time series of all three pairs of stations.

In Figure 4, you see a break in the pairs B and A as well as C and A, whereas there is no break in the pair C and B in the year 1940. Thus probably the break has happened in station A. Less likely, but possible, is that there were two breaks of about the same size in both B and C in 1940. In a real case, one would have much more stations (and thus pairs) to infer the right station (as well as break date and size) with more certainty. Having three stations is the bare minimum for homogenisation.

By comparing pairs you take out a lot of weather noise and also the complicated regional climate signal. Using difference time series, you can compute the statistical significance of the break more reliably and also estimate the date and the size of the breaks more accurately. Using difference time series for homogenisation is called relative homogenisation (Conrad and Pollak, 1950).

As you can see in Figure 1, station B is 2 °C warmer than A (and station C, 1 °C), but for homogenisation this is inconsequential. B is 2 °C warmer before and after the break. This does not influence the computation of the size of the break on the difference time series. “Sceptics” often claim that a higher temperature at a neighbouring station will pollute the candidate station. This is clearly not the case.

The strength of homogenisation as a statistical method is limited, you will not detect many small jumps. What is a small jump depends on the size of the jump relative to the noise of the difference time series. If you have a very dense network with many stations, you will be able to find more breaks as in a sparse network. Generally, national datasets are thus better homogenised as global ones. National weather services have access to more climate data, but also to more metadata (data about the data, such as the dates of many of the changes), if only because this is written in the native language. Meta data is especially helpful in determining the most likely date for a break. The first guess from statistical homogenisation can help you selecting information sources; for example sometimes even the local newspapers are studied, but you cannot read all newspapers printed in the last century. Thus the current trend to build global datasets with raw data (ISTI & BEST) and process (homogenise) them with open-source algorithms will lead to more transparency, but may also lead to a worse quality as using data homogenised by national weather services.

Another limitation of relative homogenisation is that you cannot detect jumps that occur in all stations at the same time. For example, Germany used to use three fixed hour measurements to compute the daily means (Mannheimer Stunden). In April 2004 the German Weather Service switched to computing the mean from 24 hourly values. Fortunately, such inhomogeneities are well documented. Furthermore, you can detect them by comparing multiple networks, for instance, the professional and the volunteer network, or networks of multiple organisation or countries.

Using pairs you can also detect trends that occur in single stations; see Figure 5.

Figure 5. Three station pairs with two local trends. In station A there is a trend between 1890 and 1940; in station B there is a trend between 1950 and 1980.

In the beginning of the period, before 1940, the pair C-B is flat, that is, this pair shows no inhomogeneities. However, in the pairs B-A as well as C-A, you can see a slight trend. This trend is thus likely because of station A. In the second half of the record, the pair C-A is unperturbed, but the pairs B-A and C-B show a strong short trend, which is likely due to station B. In station A, the warming continues after the end of the trend, this could represent a station in a city that experiences an increase in the Urban Heat Island effect (UHI). The trend in station B ends with a jump, which could be because the station was relocated to a nearby airport. Or the trend could have been due to growing vegetation, which was cut at the end of the period.

It is often hard to distinguish between a local trend and (especially multiple) breaks. On the other hand, most homogenisation algorithms correct trends by inserting multiple breaks with the same sign. Numerical blind validation of statistical homogenisation algorithms suggest that this does not make much difference.

In case of local trends, it would again be a problem if two stations have a trend of a similar size in a similar region. In this example, this trend would then not be removed from stations those two stations and erroneously be added to the third station. This would be the "sceptic"-scenario in which the bad stations hurt the good ones. However, this would only happen if the majority of stations in a specific period would have a trend, which is unlikely for larger networks. Currently, the understanding of the UHI effect is that it is limited to very large cities and only occurs for a limited period as the UHI does not increase any more after a certain density is reached. Such a small fraction of the data affected by local trends is something statistical homogenisation should be able to handle easily.

Statistical homogenisation becomes a challenging scientific problem as soon as you have multiple breaks in one time series (as in Figure 6). This is especially the case as in reality all time series contain breaks and the pairs thus have a double amount of breaks. It is typical for temperature records in industrial countries to have one break every 15 to 20 years. In this case, having more than just three well-correlated stations becomes essential to be able to solve the problem accurately.

Next to using multiple pairs for homogenisation, other methods use a composite reference. These methods compute the difference time series between the candidate station, which is investigated for breaks, and such a composite reference. The composite is computed as a weighted average over many neighbouring stations. This cannot be illustrated here as much more than two reference stations should be used to reduce the inhomogeneities in the composite reference.

These methods assume that the reference is sufficiently homogeneous and any breaks in the difference time series belong to the candidate station. The user of such methods should thus make sure that clearly inhomogeneous stations are not used to compute the composite. Alternatively, automatic methods can compute multiple composite references and test them for their homogeneity. Among the best homogenisation algorithms are methods using composite references and methods using pairs of stations; both strategies are competitive.

Figure 6. Three pairs of stations. Station A contains three breaks in the years 1910, 1940 and 1980.

In a previous post, I have written about the causes of inhomogeneities and the methods used to remove them. Statistical homogenisation methods have been well validated. My own interest with this topic started when I was asked to help organise a blind validation of statistical homogenisation methods. Much more difficult as annual or monthly temperature data is the homogenisation of daily data, which is often needed to study changes in extreme weather. Currently, I am personally sceptical of many such studies that do not carefully study the quality of their data. This is a major challenge for climatology in the coming years.

When it come to the homogenisation of annual mean temperature data, I do not understand why "sceptics" reject homogenisation. There is nothing in the relative homogenisation principle that biases the trend. Also in the above mentioned numerical validation study, no signs of biases in the trends were seen in the actual software packages used to homogenise climate data (Venema et al., 2012). There is probably a bias in the raw data, because past recorded temperatures were too high due to radiation errors (sun and infra-red; Brunet et al., 2010; Böhm et al., 2010). Consequently, the trend in the raw data is too weak. Because homogenisation cannot find all inhomogeneities, part of this bias remains in the homogenised data (Williams et al., 2012) and the trends found in the homogenised data are thus more likely to be too weak as too strong.

References

Brunet, M., Asin, J., Sigró, J., Banón, M., García, F., Aguilar, E., Esteban Palenzuela, J., Peterson, T. C., and Jones, P.: The minimization of the screen bias from ancient Western Mediterranean air temperature records: an exploratory statistical analysis. Int. J. Climatol., doi: 10.1002/joc.2192, 2010.

Böhm, R. P.D. Jones, J. Hiebl, D. Frank, M. Brunetti, M. Maugeri. The early instrumental warm-bias: a solution for long central European temperature series 1760–2007. Climatic Change, pp. 101:41–67, doi: 10.1007/s10584-009-9649-4, 2010.

Conrad, V. and Pollak, C. Methods in climatology. Harvard University Press, Cambridge, MA, p. 459, 1950.

Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma. Benchmarking homogenization algorithms for monthly data., Climate of the Past, 8, pp. 89-115, 2012.

Williams, C. N., Jr., M. J. Menne, and P. Thorne. Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. J. Geophys. Res., doi: 10.1029/2011JD016761, 2012, (see also blog post & manuscript).

Variable Variability

Pages

Wednesday, 8 August 2012