Friday 17 February 2012

HUME: Homogenisation, Uncertainty Measures and Extreme weather

Proposal for future research in homogenisation

To keep this post short, a background in homogenisation is assumed and not every argument is fully rigorous.


This document wants to start a discussion on the research priorities in homogenisation of historical climate data from surface networks. It will argue that with the increased scientific work on changes in extreme weather, the homogenisation community should work more on daily data and especially on quantifying the uncertainties remaining in homogenized data. Comments on these ideas are welcome as well as further thoughts. Hopefully we can reach a consensus on research priorities for the coming years. A common voice will strengthen our voice with research funding agencies.


From homogenisation of monthly and yearly data, we have learned that the size of breaks is typically on the order of the climatic changes observed in the 20th century and that period between two detected breaks is around 15 to 20 years. Thus these inhomogeneities are a significant source of error and need to be removed. The benchmark of the Cost Action HOME has shown that these breaks can be removed reliably, that homogenisation improves the usefulness of the temperature and precipitation data to study decadal variability and secular trends. Not all problems are already optimally solved, for instance the solutions for the inhomogeneous reference problem are still quite ad hoc. The HOME benchmark found mixed results for precipitation and the handling of missing data can probably be improved. Furthermore, homogenisation of other climate elements and from different, for example dry, regions should be studied. However, in general, annual and monthly homogenisation can be seen as a mature field. The homogenisation of daily data is still in its infancy. Daily datasets are essential for studying extremes of weather and climate. Here the focus is not on the mean values, but on what happens in the tails of the distributions. Looking at the physical causes of inhomogeneities, one would expect that many of them especially affect the tails of the distributions. Likewise the IPCC AR4 report warns that changes in extremes are often more sensitive to inhomogeneous climate monitoring practices than changes in the mean.


A value without an error has no scientific value. Still homogenised data is given out as if they were real homogeneous observations. Many users may not even be aware that the data was homogenized, how the data was homogenised and which the limitations of homogenised data are. Considering this error is an urgent problem, but not trivial. Not detected inhomogeneities and uncertainties in the correction terms produce temporally correlated errors at all time scales. In case of systematic errors (e.g. biases due to precipitation measurements on roofs), remaining inhomogeneities may also cause spatially correlated errors between the stations. Also homogenisation may introduce such errors. Finally, the remaining errors will be different for every station as it depends on the frequency of the inhomogeneities and on the quality of the reference time series. In case no reference could be used (absolute homogenisation) this should come back in the error estimates.

These errors are complex; quantifying and communicating them will be difficult. A complete description of the spatial temporal and scaling character of these errors may require an ensemble approach. Validation studies with realistic data and metadata can provide general guidance on expected errors. A ordinal error estimate for every station, would already be very helpful as it would allow a climatologist to check whether the analysis depends on the quality of the homogenisation.

Uncertainty estimates are needed for all averaging scales, but especially for daily homogenised data because it is much more difficult to homogenize and because errors in the distribution are especially difficult to quantify and communicate.

Daily data homogenisation

A second priority is that daily homogenisation methods need to be developed further. To homogenize daily data, the detection is normally performed on the monthly or annual means, but some changes in observation practices may not affect these means significantly. Adjustments are often only applied to the mean of the distribution. Correction algorithms for the distribution do exist, but as Della-Marta and Wanner wrote in their article on their correction method (HOM) such methods only reliably correct the first three moments. These methods have currently only been applied to some networks and require highly correlated neighbouring stations (cross correlations on daily scale of over 90 %).

Detection methods for daily data need to be developed. Methods are needed that also detect changes in the tails of the distribution, which may not affect the mean. Correction methods should be developed that estimate the level of detail that is needed and possible. Depending on the quality of the reference time series and the nature of the inhomogeneity in some cases only the annual average may be correctable, while for other breaks corrections of the monthly mean and standard deviations may be possible and at still other breaks the main moments of the daily distribution may be adjustable.

To assess in influence of inhomogeneities on changes in extreme weather, the various causes of inhomogeneities and their influence on the distribution need to be studied. Especially frequent causes with the potential to bias the dataset, such as the changes in instrumentation in the early instrumental period and the transition to automatic weather stations, should be studied in detail. This information is also needed to develop, test and benchmark daily homogenisation methods with realistic validation datasets. The most direct and reliable way to study daily inhomogeneities is by analysing parallel measurements, either ones performed during the transitions, or parallel measurements with reproduced historical set-ups.

A final thought

The core assumption of relative homogenisation is that nearby stations perceive the same climate. Consequently, perturbations in stations due to local factors happening at one station at a time are removed. For example, the urban heat island effect is removed by homogenisation, which is good for studying the large-scale climate, but not an error from the perspective of a city climatologist. Similarly, the influence on climate of other land-use changes such as deforestation, irrigation and land reclamation can currently not be studied. To be able to separate perturbations in the climate due to local land-use change from other inhomogeneities, we need to be able to detect the causes of the (most significant) inhomogeneities. Maybe daily data from multiple climatic elements is sufficient information to perform such a fingerprinting of the causes of inhomogeneities. That would be worth trying.


We thus argue that estimates for the remaining errors of homogenized data, a better understanding of the nature of daily inhomogeneities and better tools to correct them will be the main challenges for the coming years. What do you see as research priorities? Comments left below are most valuable, but sending them to Victor Venema is also highly appreciated and will be taken into account in a follow-up document.

More posts on homogenisation

Homogenisation of monthly and annual data from surface stations
A short description of the causes of inhomogeneities in climate data (non-climatic variability) and how to remove it using the relative homogenisation approach.
New article: Benchmarking homogenisation algorithms for monthly data
Raw climate records contain changes due to non-climatic factors, such as relocations of stations or changes in instrumentation. This post introduces an article that tested how well such non-climatic factors can be removed.
A short introduction to the time of observation bias and its correction
The time of observation bias is an important cause of inhomogeneities in temperature data.
Statistical homogenisation for dummies
A primer on statistical homogenisation with many pictures.
Investigation of methods for hydroclimatic data homogenization
An example of the daily misinformation spread by the blog Watts Up With That? In this case about homogenisation.

* Had he not died prematurely, David Hume would be 300 years old today. Hume is known for his philosophical empiricism and scepticism and thus the perfect patron for empirical climate change research. Furthermore it is fitting that David Hume was the son of Joseph Home; he changed his name because the English had difficulty pronouncing it. This proposal can also be seen as the son of HOME.


  1. Two people wrote to me because they see the homogenization of precipitation as a priority. Precipitation is clearly an important climatic element and the results from the HOME benchmark paper showed that homogenization algorithms still have difficulties with precipitation. Only the best algorithms improved the data for most statistics and also they did not improve the data much.

    Homogenization of precipitation was high on my list of priorities as well and only barely did not make it into the above proposal. In Bonn we are working on the homogenization of precipitation (Elke Rustemeier) and I was therefore afraid I may be biased to prefer this theme. Furthermore, it is not clear to me where to search for improvements. Does anyone have any new ideas? Or is it just a matter of investing sufficient time? We did not pay much attention to precipitation in previous validation studies.

  2. Someone wrote me to say that we forgot to state an obvious research theme, namely a better understanding causes of the artificial changes. This allows for more precise detection and correction models for homogenisation and for the generation of better, more realistic, validation datasets.


Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.