Tuesday, 18 September 2012

Future research in homogenisation of climate data – EMS2012 in Poland

By Enric Aguilar and Victor Venema

The future of research and training in homogenisation of climate data was discussed at the European Meteorological Society in Lodz by 21 experts. Homogenisation of monthly temperature data has improved much in the last years, as seen in the results of the COST-HOME project. On the other hand the homogenization of daily and subdaily data is still in its infancy and this data is used frequently to analyse changes in extreme weather. It is expected that inhomogeneities in the tails of the distribution are stronger than in the means. To make such analyses on extremes more reliable, more work on daily homogenisation is urgently needed. This does not mean than homogenisation at the monthly scale is already optimal, much can still be improved.

Parallel measurements

Parallel measurements with multiple measurement set-ups were seen as an important way to study the nature of inhomogeneities in daily and sub-daily data. It would be good to have a large international database with such measurements. The regional climate centres (RCC) could host such a dataset. Numerous groups are working on this topic, but more collaboration is needed. Also more experiments would be valuable.

When gathering parallel measurements the metadata is very important. INSPIRE (an EU Directive) has a standard format for metadata, which could be used.

It may be difficult to produce an open database with parallel measurements as European national meteorological and hydrological services are often forced to sell their data for profit.(Ironically, in the Land the Free (markets), climate data is available freely, the public already paid for it with their tax money after all.) Political pressure to free climate data is needed. Finland is setting a good example and will free its data in 2013.


The HOME benchmark for testing monthly homogenisation methods was very useful, but the dataset was rather small because also manual methods participated. There is interest in a new validation dataset for automatic homogenization. This should have more networks (n>100), to reduce the non-Gaussian error in the validation metrics and it should have more stations per network (n>100) to test more accurately how the homogenization methods select their reference stations (regionalisation; see below).

A future benchmark should have a larger diversity with respect to the density of the network, the lengths of the station series (a realistic network structure) and the statistical properties of the inhomogeneities. Also including biases in the inhomogeneities is important; for example the warming bias of raw temperature measurements in the early instrumental period.

The HOME benchmark showed that the homogenization of temperature for a typical European network improves the quality of the data considerably. It would be worthwhile to also test more difficult cases, such as more sparse networks in Africa or more sparse networks going further back in time. Especially for such more difficult cases it is paramount to also estimate the systematic biases in the data after homogenization.

International Surface Temperature Initiative (ISTI) is organising a benchmarking initiative for monthly methods. The validation data will be available this autumn. In the next cycle, in about three years, the ISTI will also benchmark homogenisation methods for daily data. There are not many algorithms available that can homogenize daily data automatically. MASH can do so correcting the means, AnClim can in principle also do so applying HOM (if the data meets the strict conditions of application of HOM, which will not be the case for all stations pairs in such a global dataset) and thus correct the temperature distribution.

Regionalisation, selection of references

We did not do much research yet on regionalisation, how to select the best reference stations. Good reference stations should have a similar long term climate signal. This is not necessarily related to high correlations, which is what most automatic methods use to determine the reference stations. In mountainous regions, the mountain stations are often best not compared with valley stations, or coastal stations may better not be compared with inland or even mountainous stations. Also exposure may be important. It would be valuable if we could codify the expertise of climatologist in selecting the best coherent climate regions.

Precipitation and other elements

A number of people were interested in working on precipitation. The results on the HOME benchmark for precipitation were not as good as for temperature. Precipitation is more difficult as the cross correlation between stations are typically lower. On the other hand, this is no excuse for those algorithms with actually made the data worse. The main bottleneck may well be the correction methods, not so much the detection. Also the homogenization of precipitation is simply less well validated and validation studies in the past mainly focused on detection and less on correction. For non-European stations and naturally for daily data the assumption that precipitation data follows are lognormal distribution may also be a problem.

Furthermore, some were interested in the homogenization of wind and humidity.

Urban heat island

There is a lot of work on the urban heat island effect, but with regards to changes in the magnitude of the UHI effect, which would be important for homogenization, there is not much work in Europe. Building some capacity and networking these groups could be valuable. There is a running EU project on the urban heat island.


Training is important and labour intensive. Thus collaboration would be valuable. It should have multiple resources from manuals focused on users (next to manuals for developers), working examples (such as is available for RhTest) and training courses. The upcoming ECSN data management workshops, homogenization workshops and summer schools in Brno may offer opportunities to hold training courses.

More posts on homogenisation

Statistical homogenisation for dummies
A primer on statistical homogenisation with many pictures.
Homogenisation of monthly and annual data from surface stations
A short description of the causes of inhomogeneities in climate data (non-climatic variability) and how to remove it using the relative homogenisation approach.
New article: Benchmarking homogenisation algorithms for monthly data
Raw climate records contain changes due to non-climatic factors, such as relocations of stations or changes in instrumentation. This post introduces an article that tested how well such non-climatic factors can be removed.
HUME: Homogenisation, Uncertainty Measures and Extreme weather
Proposal for future research in homogenisation of climate network data.
A short introduction to the time of observation bias and its correction
The time of observation bias is an important cause of inhomogeneities in temperature data.

No comments: