Friday 29 March 2013

Special issue on homogenisation of climate series

The open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás" has just published a special issue on homogenization of climate records. This special issue contains eight research papers. It is an offspring of the COST Action HOME: Advances in homogenization methods of climate series: an integrated approach (COST-ES0601).

To be able to discuss eight papers, this post does not contain as much background information as usual and is aimed at people already knowledgeable about homogenization of climate networks.


Mónika Lakatos and Tamás Szentimrey: Editorial.
The editorial explains the background of this special issue: the importance of homogenisation and the COST Action HOME. Mónika and Tamás thank you very much for your efforts to organise this special issue. I think every reader will agree that it has become a valuable journal issue.

Monthly data

Ralf Lindau and Victor Venema: On the multiple breakpoint problem and the number of significant breaks in homogenization of climate records.
My article with Ralf Lindau is already discussed in a previous post on the multiple breakpoint problem.
José A. Guijarro: Climatological series shift test comparison on running windows.
Longer time series typically contain more than one inhomogeneity, but statistical tests are mostly designed to detect one break. One way to resolve this conflict is by applying these tests on short moving windows. José compares six statistical detection methods (t-test, Standard Normal Homogeneity Test (SNHT), two-phase regression (TPR), Wilcoxon-Mann-Whithney test, Durbin-Watson test and SRMD: squared relative mean difference), which are applied on running windows with a length between 1 and 5 years (12 to 60 values (months) on either side of the potential break). The smart trick of the article is that all methods are calibrated to a false alarm rate of 1% for better comparison. In this way, he can show that the t-test, SNHT and SRMD are best for this problem and almost identical. To get good detection rates, the window needs to be at least 2*3 years. As this harbours the risk of having two breaks in one window, José has decided to change his homogenization method CLIMATOL to using the semi-hierarchical scheme of SNHT instead of using windows. The methods are tested on data with just one break; it would have been interesting to also simulate the more realistic case with multiple independent breaks.
Olivier Mestre, Peter Domonkos, Franck Picard, Ingeborg Auer, Stéphane Robin, Emilie Lebarbier, Reinhard Böhm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klan-car, Brigitte Dubuisson, and Petr Stepanek: HOMER: a homogenization software – methods and applications.
HOMER is a new homogenization method and is developed using the best methods tested on the HOME benchmark. Thus theoretically, this should be the best method currently available. Still, sometimes interactions between parts of an algorithm can lead to unexpected results. It would be great if someone would test HOMER on the HOME benchmark dataset, so that we can compare its performance with the other algorithms.

Luís Freitas, Mário Gonzalez Pereira, Liliana Caramelo, Manuel Mendes, and Luís Filipe Nunes: Homogeneity of monthly air temperature in Portugal with HOMER and MASH.
Our Portuguese colleagues compare the new homogenization package HOMER with MASH. MASH is the homogenization method developed by the Hungarian Meteorological Service. They do so by homogenizing monthly temperature data for the north of Portugal. The main comparison statistic is the Spearman correlation coefficient, which is computed between one stations and all the others and between a station and a composite reference of neighbouring stations. In general HOMER increases the correlations more than MASH, this could indicate that HOMER performs better, but could also be overcorrection. This type of study cannot make that distinction. Somewhat a sign of overcorrection is that for some stations the minimum temperature is less well correlated with its direct neighbours after homogenization with HOMER. This could be because the correction method of HOMER assumes that the entire network has the same regional climate signal. This assumption may not be optimal for this large region with coastal and mountainous stations. I do not understand the results for the station Dunas de Mira (DM), here HOMER increases the correlations with all stations a lot, while MASH does not change these correlations much (Figure 2). However, the correlations with the direct neighbours are high and similar for both methods (Figure 3).
Peter Domonkos: Measuring performances of homogenisation methods.
Naturally, I loved reading Peters review on benchmarking homogenization algorithms. If only because I kinda like the topic after coordinating the HOME benchmarking and being member of the benchmarking group of the International Surface Temperature Initiative. The main message is briefly summarised in the abstract: "The principles of reliable efficiency evaluations are: (i) Efficiency tests need the use of simulated test datasets with similar properties to real observational datasets; (ii) The use of root mean square error (RMSE) and the accuracy of trend estimation must be preferred instead of the skill in detecting change-points; (iii) The evaluation of the detection of inhomogeneities must be clearly distinguished from the evaluation of whole homogenization procedures; (iv) Evaluation of homogenization methods including subjective steps needs blind tests." I do hold the opinion that the HOME benchmarking paper showed that it is more likely that the HOME benchmark had too much rather than too little platform-type break-point pairs, as this article claims, but that is a long, long quarrel between the two of us.

Daily data

Tamás Szentimrey: Theoretical questions of daily data homogenization.
Tamás wrote a thought provoking article on the correction of daily data. It contains some valuable points, which are actually worth an entire post of its own. The paper would, however, have been stronger if his strong words would have been limited to what he actually mathematically proved. He discusses the daily correction method for parallel measurements of Trewin and Trevitt (1996), which is used if you have overlapping data with the old and the new measurement set-up standing next to each other. Tamás shows that this method is biased due to the decorrelation between the two parallel measurements. In case of parallel measurements the correlations are normally very high and this bias term will thus be very small. Likely the inhomogeneity in the distribution is larger and the correction still improves the data; this can be tested by bootstrapping (Nemec et al., 2012). The paper unfortunately lacks a proof that decorrelation is also a problem for the correction method HOM (Della-Marta and Wanner, 2006), which uses a neighbouring station that is thus more decorrelated, but this method is smarter as just computing one regression. HOM and its variants are used a lot nowadays to correct daily temperature data. A better understanding that could lead to better method would thus be very valuable.
Petr Štěpánek, Pavel Zahradníček and Aleš Farda: Experiences with data quality control and homogenization of daily records of various meteorological elements in the Czech Republic in the period 1961–2010.
Our colleagues from Middle Europe describe how they generated their high-quality daily dataset for the last 50 years. The dataset (62 million values) contains temperature (minimum and maximum and temperature at 7, 14, and 21 hours and daily average), water vapour pressure (again at three fixed hours and average), wind speed (again at three fixed hours and average), maximum daily wind gust, daily precipitation totals, and daily sunshine duration. The article describes the quality control, homogenization with AnClim, the filling of missing data and gridding. It shows the number of detected breaks and their sizes for the different climate variables and time of year. The article also gives a short description of their daily correction method, which extents HOM in interesting ways by adjusting the percentiles, smoothing the corrections and correcting the distribution for every month using also data from the adjacent months. The automation of the measurements had a very strong influence on the homogeneity of the data and on the number of outliers.
Mónika Lakatos, Tamás Szentimrey, Zita Bihari, and Sándor Szalai: Creation of a homogenized climate database for the Carpathian region by applying the MASH procedure and the preliminary analysis of the data.
This article is also presents the quality control, homogenization and gridding of multiple daily variables (mean, minimum and maximum temperature, precipitation totals, wind direction and speed, sunshine duration, cloud cover, global radiation, relative humidity, surface vapour pressure and surface air pressure). For quality control and automatic homogenization, the MASH package was used, for gridding MISH, both developed by the Hungarian Meteorological Service. Special about this project is that the Carpathian region contains many countries and that therefore data of stations near the border needed to be exchanged. The gridded dataset has a resolution of 10 km and was produced using kriging (optimal estimation). The article shows the gridded maps of two indices computed from the daily data. It is a pity that such projects are so complicated to organise because of the data policy of the weather services. I am hopeful though that things are improving and that in a few years the situation will become better. Finland, for instance, has opened its climatological datasets beginning this year.

More posts on homogenisation

Statistical homogenisation for dummies
A primer on statistical homogenisation with many pictures.
Homogenisation of monthly and annual data from surface stations
A short description of the causes of inhomogeneities in climate data (non-climatic variability) and how to remove it using the relative homogenisation approach.
New article: Benchmarking homogenisation algorithms for monthly data
Raw climate records contain changes due to non-climatic factors, such as relocations of stations or changes in instrumentation. This post introduces an article that tested how well such non-climatic factors can be removed.
HUME: Homogenisation, Uncertainty Measures and Extreme weather
Proposal for future research in homogenisation of climate network data.
A short introduction to the time of observation bias and its correction
The time of observation bias is an important cause of inhomogeneities in temperature data.
Future research in homogenisation of climate data – EMS 2012 in Poland
Some ideas for future research as discussed during a side meeting at EMS2012.


Della-Marta, P. M. and H. Wanner. A method of homogenizing the extremes and mean of daily temperature measurements. J. Climate, 19, pp. 4179–4197, doi: 10.1175/JCLI3855.1, 2006.

Nemec, J., Ch. Gruber, B. Chimani, I. Auer. Trends in extreme temperature indices in Austria based on a new homogenised dataset. Int. J. Climatol., doi: 10.1002/joc.3532, 2012.

Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams,
M.J. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso,
P. Esteban, Th. Brandsma. Benchmarking homogenization algorithms for monthly data. Climate of the Past, 8, pp. 89-115, doi: 10.5194/cp-8-89-2012, 2012.

No comments:

Post a Comment

Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.