Friday, 29 March 2013

Special issue on homogenisation of climate series

The open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás" has just published a special issue on homogenization of climate records. This special issue contains eight research papers. It is an offspring of the COST Action HOME: Advances in homogenization methods of climate series: an integrated approach (COST-ES0601).

To be able to discuss eight papers, this post does not contain as much background information as usual and is aimed at people already knowledgeable about homogenization of climate networks.

Contents

Mónika Lakatos and Tamás Szentimrey: Editorial.
The editorial explains the background of this special issue: the importance of homogenisation and the COST Action HOME. Mónika and Tamás thank you very much for your efforts to organise this special issue. I think every reader will agree that it has become a valuable journal issue.

Monthly data

Ralf Lindau and Victor Venema: On the multiple breakpoint problem and the number of significant breaks in homogenization of climate records.
My article with Ralf Lindau is already discussed in a previous post on the multiple breakpoint problem.
José A. Guijarro: Climatological series shift test comparison on running windows.
Longer time series typically contain more than one inhomogeneity, but statistical tests are mostly designed to detect one break. One way to resolve this conflict is by applying these tests on short moving windows. José compares six statistical detection methods (t-test, Standard Normal Homogeneity Test (SNHT), two-phase regression (TPR), Wilcoxon-Mann-Whithney test, Durbin-Watson test and SRMD: squared relative mean difference), which are applied on running windows with a length between 1 and 5 years (12 to 60 values (months) on either side of the potential break). The smart trick of the article is that all methods are calibrated to a false alarm rate of 1% for better comparison. In this way, he can show that the t-test, SNHT and SRMD are best for this problem and almost identical. To get good detection rates, the window needs to be at least 2*3 years. As this harbours the risk of having two breaks in one window, José has decided to change his homogenization method CLIMATOL to using the semi-hierarchical scheme of SNHT instead of using windows. The methods are tested on data with just one break; it would have been interesting to also simulate the more realistic case with multiple independent breaks.
Olivier Mestre, Peter Domonkos, Franck Picard, Ingeborg Auer, Stéphane Robin, Emilie Lebarbier, Reinhard Böhm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klan-car, Brigitte Dubuisson, and Petr Stepanek: HOMER: a homogenization software – methods and applications.
HOMER is a new homogenization method and is developed using the best methods tested on the HOME benchmark. Thus theoretically, this should be the best method currently available. Still, sometimes interactions between parts of an algorithm can lead to unexpected results. It would be great if someone would test HOMER on the HOME benchmark dataset, so that we can compare its performance with the other algorithms.

Sunday, 24 March 2013

New article on the multiple breakpoint problem in homogenization

An interesting paper by Ralf Lindau and me on the multiple breakpoint problem has just appeared in a Special issue on homogenization of the open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás".

Multiple break point problem

Long instrumental time series contain non-climatological changes, called inhomogeneities. For example because of relocations or due to changes in the instrumentation. To study real changes in the climate more accurately these inhomogeneities need to be detected and removed in a data processing step called homogenization; also called segmentation in statistics.

Statisticians have worked a lot on the detection of a single break point in data. However, unfortunately, long climate time series typically contain more than just one break point. There are two ad hoc methods to deal with this.

The most used method is the hierarchical one: to first detect the largest break and then to redo the detection on the two subsections, and so on until no more breaks are found or the segments become too short. A variant is the semi-hierachical method in which old detected breaks are retested and removed if no longer significant. For example, SNHT uses a semi-hierachical scheme and thus also the pairwise homogenization algorithm of NOAA, which uses SNHT for detection.

The second ad hoc method is to detect the breaks on a moving window. This window should be long enough for sensitivity, but should not be too long because that increases the chance of two breaks in the window. In the Special issue there is an article by José A. Guijarro on this method, which is used for his homogenization method CLIMATOL.

While these two ad hoc methods work reasonably, detecting all breaks simultaneously is more powerful. This can be performed as an exhaustive search of all possible combinations (used by the homogenization method MASH). With on average one break per 15 to 20 years, the number of breaks and thus combinations can get very large. Modern homogenization methods consequently use an optimization method called dynamic programming (used by the homogenization methods PRODIGE, ACMANT and HOMER).

All the mentioned homogenization methods have been compared with each other on a realistic benchmark dataset by the COST Action HOME. In the corresponding article (Venema et al., 2012) you can find references to all the mentioned methods. The results of this benchmarking showed that multiple breakpoint methods were clearly the best. However, this is not only because of the elegant solution to the multiple breakpoint problem, these methods also had other advantages.