Monday 16 January 2012

Homogenisation of monthly and annual data from surface stations

To study climate change and variability long instrumental climate records are essential, but are best not used directly. These datasets are essential since they are the basis for assessing century-scale trends or for studying the natural (long-term) variability of climate, amongst others. The value of these datasets, however, strongly depends on the homogeneity of the underlying time series. A homogeneous climate record is one where variations are caused only by variations in weather and climate. In our recent article we wrote: “Long instrumental records are rarely if ever homogeneous”. A non-scientist would simply write: homogeneous long instrumental records do not exist. In practice there are always inhomogeneities due to relocations, changes in the surrounding, instrumentation, shelters, etc. If a climatologist only writes: “the data is thought to be of high quality” and then removes half of the data and does not mention the homogenisation method used, it is wise to assume that the data is not homogeneous.

Results from the homogenisation of instrumental western climate records indicate that detected inhomogeneities in mean temperature series occur at a frequency of roughly 15 to 20 years. It should be kept in mind that most measurements have not been specifically made for climatic purposes, but rather to meet the needs of weather forecasting, agriculture and hydrology (Williams et al., 2012). Moreover the typical size of the breaks is often of the same order as the climatic change signal during the 20th century (Auer et al., 2007; Menne et al., 2009; Brunetti et al., 2006; Caussinus and Mestre; 2004, Della-Marta et al., 2004). Inhomogeneities are thus a significant source of uncertainty for the estimation of secular trends and decadal-scale variability.

If all inhomogeneities would be purely random perturbations of the climate records, collectively their effect on the mean global climate signal would be negligible. However, certain changes are typical for certain periods and occurred in many stations, these are the most important causes discussed below as they can collectively lead to artificial biases in climate trends across large regions (Menne et al., 2010; Brunetti et al., 2006; Begert et al., 2005).

In this post I will introduce a number of typical causes for inhomogeneities and methods to remove them from the data.

Causes of inhomogeneities

The best known inhomogeneity is the urban heat island effect. The temperature in cities can be warmer than in the surrounding country side, especially at night. Thus as cities grow, one may expect that temperatures measured in cities become higher. On the other hand, with the advent of aviation, many meteorological offices and thus their stations have often been relocated from cities to nearby, typically cooler, airports (Trewin, 2010). It would be worth studying which of these two effects are strongest for urban stations. My European colleagues expect it is the cooling due to the relocation to airports.

Figure 1. This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri screen, in use in Spain and many European countries in the late 19th century and early 20th century. In the middle and to the left you see two Stevenson screens (These instruments are standing side by side in the picture, to study the influence of changes in measurement techniques).
Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.

Other non-climatic changes can be caused by changes in measurement methods. Meteorological instruments are typically installed in a screen to protect them from direct sun and wetting (Van der Meulen and Brandsma, 2008). In the 19th century it was common to use a metal screen in front of a window on a North facing wall. However, the building may warm the screen leading to higher temperature measurements. When this problem was realized the so-called Stevenson screen was introduced, typically installed in gardens, away from buildings; the two screens on the left in Figure 1 are Stevenson screens. This is still the most typical weather screen with its typical double-louvre door and walls for ventilation. In Figure 1 you see to the right a historical Montsouri screen. It is open to the North and to the bottom. This improves ventilation, but it was found that infra-red radiation from the ground can influence the measurement on sunny calm days. Therefore, they are no longer used. Nowadays automatic weather stations, which reduce labor costs, are becoming more common; they protect the thermometer by a number of white plastic cones (Begert et al., 2005). This necessitated changes from manually recorded liquid in glass thermometers to automated electrical resistance thermometers, which reduces the recorded temperature values (Menne et al., 2009).

Also other climate elements suffer from inhomogeneities. The precipitation amounts observed in the early instrumental period, roughly before 1900, are biased and are 10% lower than nowadays because the measurements were often made on a roof. At the time, instruments were installed on rooftops to ensure that the instrument is never shielded from the rain, but it was found later that due to the turbulent flow of the wind on roofs, some rain droplets and especially snow flakes did not fall into the opening. Consequently measurements are nowadays performed closer to the ground.

Other typical causes of inhomogeneities are a change in measurement location; many observations, especially of precipitation are performed by volunteers in their garden or at their work place. Changes in the surrounding can often not be avoided, e.g., changes in the vegetation, the sealing of surfaces, and warm and sheltering buildings in the vicinity. There are also changes in measurement procedures such as the way the daily mean temperature is computed (by means of the minimum (Tmin) and maximum (Tmax) temperature, or by averaging over 3 or 4 readings per day, or based on 10 minute data). Also changes in the observation times can lead to inhomogeneities. A recent review by Trewin (2010) focused on the causes of inhomogeneities.

The inhomogeneities are not always errors. This is seen most clear for stations affected by warming due to the urban heat island effect. From the perspective of global warming, such local effects are undesirable, but to study the influence of climate on health they are fine. Other inhomogeneities are due to compromises that have to be made been ventilation and protection against the sun and wetting in the design of a weather shelter. Trying to reduce one type of error (for a certain weather condition) in the design will often lead to the more errors from the other factors. Meteorological measurements are not made in the laboratory. Small errors are inevitable, meteorologically not relevant, but if such an error changes, it may well be an inhomogeneity.


To reliably study the real development of the climate, non-climatic changes have to be removed. The date of the change is often documented (called meta data: data about data), but not always. Meta data is often only available in the local language. In the best case, there are parallel measurements with the original and the new set-up for several years (Aguilar et al., 2003). This is a WMO (World Meteorological Organisation) guideline, but unfortunately not very often performed, if only because the reason for stopping the original measurement is not known in advance, but probably more often to save money. By making parallel measurement with replicas of historical instruments, screens, etc. some of these inhomogeneities can still be studied today.

Because you are never sure that your meta data (station history) is complete, statistical homogenisation should always be applied as well. The most commonly used statistical principle to detect and remove the effects of artificial changes is relative homogenisation, which assumes that nearby stations are exposed to almost the same climate signal and that thus the differences between nearby stations can be utilized to detect inhomogeneities (Conrad and Pollack, 1950). By looking at the difference time series, the year to year variability of the climate is removed, as well as regional climatic trends. In such a difference time series, a clear and persistent jump of, for example 1°C, can easily be detected and can only be due to changes in the measurement conditions. (This relative method does not work when changes are applied to a whole network. Such extensive changes are less problematic, however, because are typically well documented and can be detected by studying multiple networks simultaneously.)

If there is a jump (break) in a difference time series, it is not yet clear which of the two stations it belongs to. Furthermore, time series typically have more than just one jump. These two features make statistical homogenisation a challenging and beautiful statistical problem. Homogenisation algorithms typically differ in how they try to solve these two fundamental problems.

In the past, it was customary to compute a composite reference time series computed from multiple nearby stations, compare this reference to the candidate series and assume that any jumps found are due to the candidate series (Alexandersson, 1986). The latter assumption works because by using multiple stations as reference, the influence of inhomogeneities on the reference are much reduced. However, modern algorithms, no longer assume that the reference is homogeneous and can achieve better results this way. There are two main ways to do so. You can compute multiple composite reference time series from subsets of surrounding stations and test these references for homogeneity as well (Szentimrey, 1999). Alternatively, you can only use pairs of stations and by comparing all pairs with each determine which station most likely is the one with the break (Caussinus and Mestre, 2004). If there is a break in 1950 in pair A&B and B&C, but not in A&C, the break is likely in station B; with more pairs such an inference can be made with more certainty.

If there are multiple breaks in a time series, the number of combinations easily becomes very large and it is becomes impossible to try them all. For example in case of five breaks (k=5) in 100 years of annual data (n=100), the number of combinations is about 1005=1010 or 10 billion. This problem is sometimes solved iteratively/hierarchically, by first searching for the largest jump and then repeating the search in both sub-sections until they are too small. This does not always produce good results. A direct way to solve the problem is by dynamical programming, which, to quote Wikipedia is neither dynamic, nor programming, but rather an optimization method. In dynamical programming the solution is not only solved for the entire time series, but also for all truncated time series. In this way, the number of computation is of the order of n2.

Sometimes there are no other stations in the same climate region. In this case, sometimes absolute homogenisation is applied and the inhomogeneities are detected in the time series of one station. If there is a clear and large break at a certain date, you may well be able to correct it, but smaller jumps and gradually occurring inhomogeneities (urban heat island or a growing vegetation) cannot be distinguished from real natural variability and climate change. Data homogenized this way does not have the quality you may expect and should be used with much care.

Inhomogeneities in climate data

By homogenizing climate datasets, it was found that sometimes inhomogeneities can cause biased trends in raw data; that homogenisation is indispensable to obtain reliable regional or global trends. For example, for the Greater Alpine Region a bias in the temperature trend between 1870s and 1980s of half a degree was found, which was due to decreasing urbanization of the network and systematic changes in the time of observation (Böhm et al., 2001). The precipitation records of the early instrumental period are biased by -10% due to the systematic higher installation of the gauges at the time (Auer et al., 2005). Other possible bias sources are new types of weather shelters (Brunet et al., 2010; Brunetti et al., 2006), the change from liquid and glass thermometers to electrical resistance thermometers (Menne et al., 2009), as well as the tendency to replace observers by automatic weather stations (Begert et al., 2005), the urban heat island effect and the transfer of many urban stations to airports (Trewin, 2010).

In the project HOME these homogenisation algorithms were recently tested on artificial climate data with known inhomogeneities and it was found that relative homogenisation improves climate data and that the modern method that do not work with a homogeneous reference are most accurate.

More posts on homogenisation

New article: Benchmarking homogenisation algorithms for monthly data
Raw climate records contain changes due to non-climatic factors, such as relocations of stations or changes in instrumentation. This post introduces an article that tested how well such non-climatic factors can be removed.
A short introduction to the time of observation bias and its correction
The time of observation bias is an important cause of inhomogeneities in temperature data.
HUME: Homogenisation, Uncertainty Measures and Extreme weather
Proposal for future research in homogenisation of climate network data.
Statistical homogenisation for dummies
A primer on statistical homogenisation with many pictures.
Investigation of methods for hydroclimatic data homogenisation
An example of the daily misinformation spread by the blog Watts Up With That? In this case about homogenisation.


Aguilar E., Auer, I., Brunet, M., Peterson, T. C., and Wieringa, J.: Guidelines on climate metadata and homogenization. World Meteorological Organization, WMO-TD No. 1186, WCDMP No. 53, Geneva, Switzerland, 55 p., 2003.
Alexandersson, A.: A homogeneity test applied to precipitation data. J. Climatol., 6, 661-675, 1986.
Auer, I., R. Bohm, A. Jurkovic, W. Lipa, A. Orlik, R. Potzmann, W. Schoner, M. Ungersbock, C. Matulla, P. Jones, D. Efthymiadis, M. Brunetti, T. Nanni, K. Briffa, M. Maugeri, L. Mercalli, O. Mestre, et al. HISTALP - Historical instrumental climatological surface time series of the Greater Alpine Region. Int. J. Climatol., 27, pp. 17-46. doi: 10.1002/joc.1377, 2007.
Auer I, Böhm, R., Jurkovic, A., Orlik, A., Potzmann, R., Schöner W., et al.: A new instrumental precipitation dataset for the Greater Alpine Region for the period 1800–2002. International Journal of Climatology, 25, 139–166, 2005.
Begert, M., Schlegel, T., and Kirchhofer, W.: Homogeneous temperature and precipitation series of Switzerland from 1864 to 2000. Int. J. Climatol., 25, 65–80, 2005.
Böhm R., Auer, I., Brunetti, M., Maugeri, M., Nanni, T., and Schöner, W.: Regional temperature variability in the European Alps 1760–1998 from homogenized instrumental time series. International Journal of Climatology, 21, 1779–1801, 2001.
Brunet, M., Asin, J., Sigró, J., Banón, M., García, F., Aguilar, E., Esteban Palenzuela, J., Peterson, T. C., and Jones, P.: The minimization of the screen bias from ancient Western Mediterranean air temperature records: an exploratory statistical analysis. Int. J. Climatol., doi: 10.1002/joc.2192, 2010.
Brunetti M., Maugeri, M., Monti, F., and Nanni, T.: Temperature and precipitation variability in Italy in the last two centuries from homogenized instrumental time series. International Journal of Climatology, 26, 345–381, 2006.
Caussinus, H. and Mestre, O.: Detection and correction of artificial shifts in climate series. Appl. Statist., 53, part 3, 405-425, 2004.
Conrad, V. and Pollak, C.: Methods in Climatology. Harvard University Press, Cambridge, MA, 459 p., 1950.
Della-Marta, P. M., Collins, D., and Braganza, K.: Updating Australia’s high quality annual temperature dataset. Austr. Meteor. Mag., 53, 277-292, 2004.
Menne, M. J., Williams, C. N. jr., and Vose, R. S.: The U.S. historical climatology network monthly temperature data, version 2. Bull. Am. Meteorol. Soc., 90, no.7, 993-1007, doi: 10.1175/2008BAMS2613.1, 2009.
Menne, M. J., Williams, C. N. jr., and Palecki M. A.: On the reliability of the U.S. surface temperature record. J. Geophys. Res. Atmos., 115, no. D11108, doi: 10.1029/, 2010.
Meulen, van der, J.P. and T. Brandsma. Thermometer screen intercomparison in De Bilt (The Netherlands), part I: Understanding the weather-dependent temperature differences. Int. J. Climatol., 28, pp. 371-387, 2008.
Szentimrey, T.: Multiple Analysis of Series for Homogenization (MASH). Proceedings of the second seminar for homogenization of surface climatological data, Budapest, Hungary; WMO, WCDMP-No. 41, 27-46, 1999.
Trewin, B.: Exposure, instrumentation, and observing practice effects on land temperature measurements. WIREs Clim. Change, 1, 490–506, doi: 10.1002/wcc.46, 2010.
Williams, C. N. jr., Menne, M. J., Thorne, P.W. Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. In press Journal of Geophysical Research-Atmospheres, 2012.

1 comment:

Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.