Monday, 16 January 2012

Homogenisation of monthly and annual data from surface stations

To study climate change and variability long instrumental climate records are essential, but are best not used directly. These datasets are essential since they are the basis for assessing century-scale trends or for studying the natural (long-term) variability of climate, amongst others. The value of these datasets, however, strongly depends on the homogeneity of the underlying time series. A homogeneous climate record is one where variations are caused only by variations in weather and climate. In our recent article we wrote: “Long instrumental records are rarely if ever homogeneous”. A non-scientist would simply write: homogeneous long instrumental records do not exist. In practice there are always inhomogeneities due to relocations, changes in the surrounding, instrumentation, shelters, etc. If a climatologist only writes: “the data is thought to be of high quality” and then removes half of the data and does not mention the homogenisation method used, it is wise to assume that the data is not homogeneous.

Results from the homogenisation of instrumental western climate records indicate that detected inhomogeneities in mean temperature series occur at a frequency of roughly 15 to 20 years. It should be kept in mind that most measurements have not been specifically made for climatic purposes, but rather to meet the needs of weather forecasting, agriculture and hydrology (Williams et al., 2012). Moreover the typical size of the breaks is often of the same order as the climatic change signal during the 20th century (Auer et al., 2007; Menne et al., 2009; Brunetti et al., 2006; Caussinus and Mestre; 2004, Della-Marta et al., 2004). Inhomogeneities are thus a significant source of uncertainty for the estimation of secular trends and decadal-scale variability.

If all inhomogeneities would be purely random perturbations of the climate records, collectively their effect on the mean global climate signal would be negligible. However, certain changes are typical for certain periods and occurred in many stations, these are the most important causes discussed below as they can collectively lead to artificial biases in climate trends across large regions (Menne et al., 2010; Brunetti et al., 2006; Begert et al., 2005).

In this post I will introduce a number of typical causes for inhomogeneities and methods to remove them from the data.

Tuesday, 10 January 2012

New article: Benchmarking homogenisation algorithms for monthly data

The main paper of the COST Action HOME on homogenisation of climate data has been published today in Climate of the Past. This post describes shortly the problem of inhomogeneities in climate data and how such data problems are corrected by homogenisation. The main part explains the topic of the paper, a new blind validation study of homogenisation algorithms for monthly temperature and precipitation data. All the most used and best algorithms participated.


To study climatic variability the original observations are indispensable, but not directly usable. Next to real climate signals they may also contain non-climatic changes. Corrections to the data are needed to remove these non-climatic influences, this is called homogenisation. The best known non-climatic change is the urban heat island effect. The temperature in cities can be warmer than on the surrounding country side, especially at night. Thus as cities grow, one may expect that temperatures measured in cities become higher. On the other hand, many stations have been relocated from cities to nearby, typically cooler, airports. Other non-climatic changes can be caused by changes in measurement methods. Meteorological instruments are typically installed in a screen to protect them from direct sun and wetting. In the 19th century it was common to use a metal screen on a North facing wall. However, the building may warm the screen leading to higher temperature measurements. When this problem was realised the so-called Stevenson screen was introduced, typically installed in gardens, away from buildings. This is still the most typical weather screen with its typical double-louvre door and walls. Nowadays automatic weather stations, which reduce labor costs, are becoming more common; they protect the thermometer by a number of white plastic cones. This necessitated changes from manually recorded liquid and glass thermometers to automated electrical resistance thermometers, which reduces the recorded temperature values.

One way to study the influence of changes in measurement techniques is by making simultaneous measurements with historical and current instruments, procedures or screens. This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri screen, in use in Spain and many European countries in the late 19th century and early 20th century. In the middle, Stevenson screen equipped with automatic sensors. Leftmost, Stevenson screen equipped with conventional meteorological instruments.
Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.

A further example for a change in the measurement method is that the precipitation amounts observed in the early instrumental period (about before 1900) are biased and are 10% lower than nowadays because the measurements were often made on a roof. At the time, instruments were installed on rooftops to ensure that the instrument is never shielded from the rain, but it was found later that due to the turbulent flow of the wind on roofs, some rain droplets and especially snow flakes did not fall into the opening. Consequently measurements are nowadays performed closer to the ground.

Sunday, 8 January 2012

What distinguishes a benchmark?

Benchmarking is a community effort

Science has many terms for studying the validity or performance of scientific methods: testing, validation, intercomparison, verification, evaluation, and benchmarking. Every term has a different, sometimes subtly different, meaning. Initially I had wanted to compare all these terms with each other, but that would have become a very long post, especially as the meaning for every term is different in business, engineering, computation and science. Therefore, this post will only propose a definition for benchmarking in science and what distinguishes it from other approaches, casually called other validation studies from now on.

In my view benchmarking has three distinguishing features.
1. The methods are tested blind.
2. The problem is realistic.
3. Benchmarking is a community effort.
The term benchmark has become fashionable lately. It is also used, however, for validation studies that do not display these three features. This is not wrong, as there is no generally accepted definition of benchmarking. In fact in an important article on benchmarking by Sim et al. (2003) defines "a benchmark as a test or set of tests used to compare the performance of alternative tools or techniques." which would include any validation study. Then they limit the topic of their article, however, to interesting benchmarks, which are "created and used by a technical research community." However, if benchmarking is used for any type of validation study, there would not be any added value to the word. Thus I hope this post can be a starting point for a generally accepted and a more restrictive definition.