Showing posts with label HOME. Show all posts
Showing posts with label HOME. Show all posts

Friday, June 27, 2014

Self-review of problems with the HOME validation study for homogenization methods

In my last post, I argued that post-publication review is no substitute for pre-publication review, but it could be a nice addition.

This post is a post-publication self-review, a review of our paper on the validation of statistical homogenization methods, also called benchmarking when it is a community effort. Since writing this benchmarking article we have understood the problem better and have found some weaknesses. I have explained these problems on conferences, but for the people that did not hear them, please find them below after a short introduction. We have a new paper in open review that explains how we want to do better in the next benchmarking study.

Benchmarking homogenization methods

In our benchmarking paper we generated a dataset that mimicked real temperature or precipitation data. To this data we added non-climatic changes (inhomogeneities). We requested the climatologists to homogenize this data, to remove the inhomogeneities we had inserted. How good the homogenization algorithms are can be seen by comparing the homogenized data to the original homogeneous data.

This is straightforward science, but the realism of the dataset was the best to date and because this project was part of a large research program (the COST Action HOME) we had a large number of contributions. Mathematical understanding of the algorithms is also important, but homogenization algorithms are complicated methods and it is also possible to make errors in the implementation, thus such numerical validations are also valuable. Both approaches complement each other.


Group photo at a meeting of the COST Action HOME with most of the European homogenization community present. These are those people working in ivory towers, eating caviar from silver plates, drinking 1985 Romanee-Conti Grand Cru from crystal glasses and living in mansions. Enjoying the good live on the public teat, while conspiring against humanity.

The main conclusions were that homogenization improves the homogeneity of temperature data. Precipitation is more difficult and only the best algorithms were able to improve it. We found that modern methods improved the quality of temperature data about twice as much as traditional methods. It is thus important that people switch to one of these modern methods. My impression from the recent Homogenisation seminar and the upcoming European Meteorological Society (EMS) meeting is that this seems to be happening.

1. Missing homogenization methods

An impressive number of methods participated in HOME. Also many manual methods were applied, which are validated less because this is more work. All the state-of-the-art methods participated and most of the much used methods. However, we forgot to test a two- or multi-phase regression method, which is popular in North America.

Also not validated is HOMER, the algorithm that was designed afterwards using the best parts of the tested algorithms. We are working on this. Many people have started using HOMER. Its validation should thus be a high priority for the community.

2. Size breaks (random walk or noise)

Next to the benchmark data with the inserted inhomogeneities, we also asked people to homogenize some real datasets. This turned out to be very important because it allowed us to validate how realistic the benchmark data is. Information we need to make future studies more realistic. In this validation we found that the size of the benchmark in homogeneities was larger than those in the real data. Expressed as the standard deviation of the break size distribution, the benchmark breaks were typically 0.8°C and the real breaks were only 0.6°C.

This was already reported in the paper, but we now understand why. In the benchmark, the inhomogeneities were implemented by drawing a random number for every homogeneous period and perturbing the original data by this amount. In other words, we added noise to the homogeneous data. However, the homogenizers that requested to make breaks with a size of about 0.8°C were thinking of the difference from one homogeneous period to the next. The size of such breaks is influenced by two random numbers. Because variances are additive, this means that the jumps implemented as noise were the square root of two (about 1.4) times too large.

The validation showed that, except for the size, the idea of implementing the inhomogeneities as noise was a good approximation. The alternative would be to draw a random number and use that to perturb the data relative to the previously perturbed period. In that case you implement the inhomogeneities as a random walk. Nobody thought of reporting it, but it seems that most validation studies have implemented their inhomogeneities as random walks. This makes the influence of the inhomogeneities on the trend much larger. Because of the larger error, it is probably easier to achieve relative improvements, but because the initial errors were absolutely larger, the absolute errors after homogenization may well have been too large in previous studies.

You can see the difference between a noise perturbation and a random walk by comparing the sign (up or down) of the breaks from one break to the next. For example, in case of noise and a large upward jump, the next change is likely to make the perturbation smaller again. In case of a random walk, the size and sign of the previous break is irrelevant. The likeliness of any sign is one half.

In other words, in case of a random walk there are just as much up-down and down-up pairs as there are up-up and down-down pairs, every combination has a chance of one in four. In case of noise perturbations, up-down and down-up pairs (platform-like break pairs) are more likely than up-up and down-down pairs. The latter is what we found in the real datasets. Although there is a small deviation that suggests a small random walk contribution, but that may also be because the inhomogeneities cause a trend bias.

3. Signal to noise ratio varies regionally

The HOME benchmark reproduced a typical situation in Europe (the USA is similar). However, the station density in much of the world is lower. Inhomogeneities are detected and corrected by comparing a candidate station to neighbouring ones. When the station density is less, this difference signal is more noisy and this makes homogenization more difficult. Thus one would expect that the performance of homogenization methods is lower in other regions. Although, also the break frequency and break size may be different.

Thus to estimate how large the influence of the remaining inhomogeneities can be on the global mean temperature, we need to study the performance of homogenization algorithms in a wider range of situations. Also for the intercomparison of homogenization methods (the more limited aim of HOME) the signal (break size) to noise ratio is important. Domonkos (2013) showed that the ranking of various algorithms depends on the signal to noise ratio. Ralf Lindau and I have just submitted a manuscript that shows that for low signal to noise ratios, the multiple breakpoint method PRODIGE is not much better in detecting breaks than a method that would "detect" random breaks, while it works fine for higher signal to noise ratios. Other methods may also be affected, but possibly not in the same amount. More on that later.

4. Regional trends (absolute homogenization)

The initially simulated data did not have a trend, thus we explicitly added a trend to all stations to give the data a regional climate change signal. This trend could be both upward or downward, just to check whether homogenization methods might have problems with downward trends, which are not typical of daily operations. They do not.

Had we inserted a simple linear trend in the HOME benchmark data, the operators of the manual homogenization could have theoretically used this information to improve their performance. If the trend is not linear, there are apparently still inhomogeneities in the data. We wanted to keep the operators in the blind. Consequently, we inserted a rather complicated and variable nonlinear trend in the dataset.

As already noted in the paper, this may have handicapped the participating absolute homogenization method. Homogenization methods used in climate are normally relative ones. These methods compare a station to its neighbours, both have the same regional climate signal, which is thus removed and not important. Absolute methods do not use the information from the neighbours; these methods have to make assumptions about the variability of the real regional climate signal. Absolute methods have problems with gradual inhomogeneities and are less sensitive and are therefore not used much.

If absolute methods are participating in future studies, the trend should be modelled more realistically. When benchmarking only automatic homogenization methods (no operator) an easier trend should be no problem.

5. Length of the series

The station networks simulated in HOME were all one century long, part of the stations were shorter because we also simulated the build up of the network during the first 25 years. We recently found that criterion for the optimal number of break inhomogeneities used by one of the best homogenization methods (PRODIGE) does not have the right dependence on the number of data points (Lindau and Venema, 2013). For climate datasets that are about a century long, the criterion is quite good, but for much longer or shorter datasets there are deviations. This illustrates that the length of the datasets is also important and that it is important for benchmarking that the data availability is the same as in real datasets.

Another reason why it is important that the benchmark data availability to be the same as in the real dataset is that this makes the comparison of the inhomogeneities found in the real data and in the benchmark more straightforward. This comparison is important to make future validation studies more accurate.

6. Non-climatic trend bias

The inhomogeneities we inserted in HOME were on average zero. For the stations this still results in clear non-climatic trend errors because you only average over a small number of inhomogeneities. For the full networks the number of inhomogeneities is larger and the non-climatic trend error thus very small. It was consequently very hard for the homogenization methods to improve this small errors. It is expected that in real raw datasets there is a larger non-climatic error. Globally the non-climatic trend will be relatively small, but within one network, where the stations experienced similar (technological and organisational) changes, it can be appreciable. Thus we should model such a non-climatic trend bias explicitly in future.

International Surface Temperature Initiative

The last five problems will be solved in the International Surface Temperature Initiative (ISTI) benchmark . Whether a two-phase homogenization method will participate is beyond our control. We do expect less participants than in HOME because for such a huge global dataset, the homogenization methods will need to be able to run automatically and unsupervised.

The standard break sizes will be made smaller. We will make ten benchmarking "worlds" with different kinds of inserted inhomogeneities and will also vary the size and number of the inhomogeneities. Because the ISTI benchmarks will mirror the real data holdings of the ISTI, the station density and the length of the data will be the same. The regional climate signal will be derived from a global circulation models and absolute methods could thus participate. Finally, we will introduce a clear non-climate trend bias to several of the benchmark "worlds".

The paper on the ISTI benchmark is open for discussions at the journal Geoscientific Instrumentation, Methods and Data Systems. Please find the abstract below.

Abstract.
The International Surface Temperature Initiative (ISTI) is striving towards substantively improving our ability to robustly understand historical land surface air temperature change at all scales. A key recently completed first step has been collating all available records into a comprehensive open access, traceable and version-controlled databank. The crucial next step is to maximise the value of the collated data through a robust international framework of benchmarking and assessment for product intercomparison and uncertainty estimation. We focus on uncertainties arising from the presence of inhomogeneities in monthly surface temperature data and the varied methodological choices made by various groups in building homogeneous temperature products. The central facet of the benchmarking process is the creation of global scale synthetic analogs to the real-world database where both the "true" series and inhomogeneities are known (a luxury the real world data do not afford us). Hence algorithmic strengths and weaknesses can be meaningfully quantified and conditional inferences made about the real-world climate system. Here we discuss the necessary framework for developing an international homogenisation benchmarking system on the global scale for monthly mean temperatures. The value of this framework is critically dependent upon the number of groups taking part and so we strongly advocate involvement in the benchmarking exercise from as many data analyst groups as possible to make the best use of this substantial effort.


Related reading

Nick Stokes made a beautiful visualization of the raw temperature data in the ISTI database. Homogenized data where non-climatic trends have been removed is unfortunately not yet available, that will be released together with the results of the benchmark.

New article: Benchmarking homogenisation algorithms for monthly data. The post describing the HOME benchmarking article.

New article on the multiple breakpoint problem in homogenization. Most work in statistics is about data with just one break inhomogeneity (change point). In climate there are typically more breaks. Methods designed for multiple breakpoints are more accurate.

Part 1 of a series on Five statistically interesting problems in homogenization.


References

Domonkos, P., 2013: Efficiencies of Inhomogeneity-Detection Algorithms: Comparison of Different Detection Methods and Efficiency Measures. Journal of Climatology, Art. ID 390945, doi: 10.1155/2013/390945.

Lindau and Venema, 2013: On the multiple breakpoint problem and the number of significant breaks in homogenization of climate records. Idojaras, Quarterly Journal of the Hungarian Meteorological Service, 117, No. 1, pp. 1-34. See also my post: New article on the multiple breakpoint problem in homogenization.

Lindau and Venema, to be submitted, 2014: The joint influence of break and noise variance on the break detection capability in time series homogenization.

Willett, K., Williams, C., Jolliffe, I., Lund, R., Alexander, L., Brönniman, S., Vincent, L. A., Easterbrook, S., Venema, V., Berry, D., Warren, R., Lopardo, G., Auchmann, R., Aguilar, E., Menne, M., Gallagher, C., Hausfather, Z., Thorarinsdottir, T., and Thorne, P. W.: Concepts for benchmarking of homogenisation algorithm performance on the global scale, Geosci. Instrum. Method. Data Syst. Discuss., 4, 235-270, doi: 10.5194/gid-4-235-2014, 2014.

Tuesday, November 12, 2013

Has COST HOME (2007-2011) passed without true impact on practical homogenisation?

Guest post by Peter Domonkos, one of the leading figures in the homogenization of climate data and developer of the homogenization method ACMANT, which is probably the most accurate method currently available.

A recent investigation done in the Centre of Climate Change of University Rovira i Virgili (Spain) showed that the ratio of the practical use of HOME-recommended monthly homogenisation methods is very low, namely it is only 8.4% in the studies published or accepted for publication in 6 leading climatic journals in the first half of 2013.

The six journals examined are the Bulletin of the American Meteorological Society, Climate of the Past, Climatic Change, International Journal of Climatology, Journal of Climate and Theoretical and Applied Climatology. 74 studies were found in which one or more statistical homogenisation methods were applied for monthly temperature or precipitation datasets, the total number of homogenisation exercises in them is 119. A large variety of homogenisation methods was applied: 34 different methods have been used, even without making distinction among different methods labelled by the same name (as it is the case with the procedures of SNHT and RHTest). HOME-recommended methods were applied only in 10 cases (8.4%) and the use of objective or semi-objective multiple break methods was even much rare, 3.4% only.

In the international blind test experiments of HOME, the participating multiple break methods produced the highest efficiency in terms of the residual RMSE and trend bias of homogenised time series. (Note that only methods that detect and correct directly the structures of multiple breaks are considered multiple break methods.) The success of multiple break methods was predictable, since their mathematical structures are more appropriate for treating the multiple break problem than the hierarchic organisation of single break detection and correction.

Wednesday, September 4, 2013

Proceedings of the Seventh Seminar for Homogenization and Quality Control in Climatological Databases published

The Proceedings of the Seventh Seminar for Homogenization and Quality Control in Climatological Databases jointly organized with the Meeting of Cost ES0601 (Home) Action MC Meeting (Budapest, Hungary, 24-27 October 2011) has now been published. These proceedings were edited by Mónika Lakatos, Tamás Szentimrey and Enikő Vincze.

It is published as a WMO report in the series on World Climate Data and Monitoring Programme. (Some figures may not be displayed right in your browser, I could see them well using Acrobat Reader as stand-alone application and they did print right.)

Friday, March 29, 2013

Special issue on homogenisation of climate series

The open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás" has just published a special issue on homogenization of climate records. This special issue contains eight research papers. It is an offspring of the COST Action HOME: Advances in homogenization methods of climate series: an integrated approach (COST-ES0601).

To be able to discuss eight papers, this post does not contain as much background information as usual and is aimed at people already knowledgeable about homogenization of climate networks.

Contents

Mónika Lakatos and Tamás Szentimrey: Editorial.
The editorial explains the background of this special issue: the importance of homogenisation and the COST Action HOME. Mónika and Tamás thank you very much for your efforts to organise this special issue. I think every reader will agree that it has become a valuable journal issue.

Monthly data

Ralf Lindau and Victor Venema: On the multiple breakpoint problem and the number of significant breaks in homogenization of climate records.
My article with Ralf Lindau is already discussed in a previous post on the multiple breakpoint problem.
José A. Guijarro: Climatological series shift test comparison on running windows.
Longer time series typically contain more than one inhomogeneity, but statistical tests are mostly designed to detect one break. One way to resolve this conflict is by applying these tests on short moving windows. José compares six statistical detection methods (t-test, Standard Normal Homogeneity Test (SNHT), two-phase regression (TPR), Wilcoxon-Mann-Whithney test, Durbin-Watson test and SRMD: squared relative mean difference), which are applied on running windows with a length between 1 and 5 years (12 to 60 values (months) on either side of the potential break). The smart trick of the article is that all methods are calibrated to a false alarm rate of 1% for better comparison. In this way, he can show that the t-test, SNHT and SRMD are best for this problem and almost identical. To get good detection rates, the window needs to be at least 2*3 years. As this harbours the risk of having two breaks in one window, José has decided to change his homogenization method CLIMATOL to using the semi-hierarchical scheme of SNHT instead of using windows. The methods are tested on data with just one break; it would have been interesting to also simulate the more realistic case with multiple independent breaks.
Olivier Mestre, Peter Domonkos, Franck Picard, Ingeborg Auer, Stéphane Robin, Emilie Lebarbier, Reinhard Böhm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klan-car, Brigitte Dubuisson, and Petr Stepanek: HOMER: a homogenization software – methods and applications.
HOMER is a new homogenization method and is developed using the best methods tested on the HOME benchmark. Thus theoretically, this should be the best method currently available. Still, sometimes interactions between parts of an algorithm can lead to unexpected results. It would be great if someone would test HOMER on the HOME benchmark dataset, so that we can compare its performance with the other algorithms.

Sunday, March 24, 2013

New article on the multiple breakpoint problem in homogenization

An interesting paper by Ralf Lindau and me on the multiple breakpoint problem has just appeared in a Special issue on homogenization of the open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás".

Multiple break point problem

Long instrumental time series contain non-climatological changes, called inhomogeneities. For example because of relocations or due to changes in the instrumentation. To study real changes in the climate more accurately these inhomogeneities need to be detected and removed in a data processing step called homogenization; also called segmentation in statistics.

Statisticians have worked a lot on the detection of a single break point in data. However, unfortunately, long climate time series typically contain more than just one break point. There are two ad hoc methods to deal with this.

The most used method is the hierarchical one: to first detect the largest break and then to redo the detection on the two subsections, and so on until no more breaks are found or the segments become too short. A variant is the semi-hierachical method in which old detected breaks are retested and removed if no longer significant. For example, SNHT uses a semi-hierachical scheme and thus also the pairwise homogenization algorithm of NOAA, which uses SNHT for detection.

The second ad hoc method is to detect the breaks on a moving window. This window should be long enough for sensitivity, but should not be too long because that increases the chance of two breaks in the window. In the Special issue there is an article by José A. Guijarro on this method, which is used for his homogenization method CLIMATOL.

While these two ad hoc methods work reasonably, detecting all breaks simultaneously is more powerful. This can be performed as an exhaustive search of all possible combinations (used by the homogenization method MASH). With on average one break per 15 to 20 years, the number of breaks and thus combinations can get very large. Modern homogenization methods consequently use an optimization method called dynamic programming (used by the homogenization methods PRODIGE, ACMANT and HOMER).

All the mentioned homogenization methods have been compared with each other on a realistic benchmark dataset by the COST Action HOME. In the corresponding article (Venema et al., 2012) you can find references to all the mentioned methods. The results of this benchmarking showed that multiple breakpoint methods were clearly the best. However, this is not only because of the elegant solution to the multiple breakpoint problem, these methods also had other advantages.

Friday, February 17, 2012

HUME: Homogenisation, Uncertainty Measures and Extreme weather

Proposal for future research in homogenisation

To keep this post short, a background in homogenisation is assumed and not every argument is fully rigorous.

Aim

This document wants to start a discussion on the research priorities in homogenisation of historical climate data from surface networks. It will argue that with the increased scientific work on changes in extreme weather, the homogenisation community should work more on daily data and especially on quantifying the uncertainties remaining in homogenized data. Comments on these ideas are welcome as well as further thoughts. Hopefully we can reach a consensus on research priorities for the coming years. A common voice will strengthen our voice with research funding agencies.

State-of-the-art

From homogenisation of monthly and yearly data, we have learned that the size of breaks is typically on the order of the climatic changes observed in the 20th century and that period between two detected breaks is around 15 to 20 years. Thus these inhomogeneities are a significant source of error and need to be removed. The benchmark of the Cost Action HOME has shown that these breaks can be removed reliably, that homogenisation improves the usefulness of the temperature and precipitation data to study decadal variability and secular trends. Not all problems are already optimally solved, for instance the solutions for the inhomogeneous reference problem are still quite ad hoc. The HOME benchmark found mixed results for precipitation and the handling of missing data can probably be improved. Furthermore, homogenisation of other climate elements and from different, for example dry, regions should be studied. However, in general, annual and monthly homogenisation can be seen as a mature field. The homogenisation of daily data is still in its infancy. Daily datasets are essential for studying extremes of weather and climate. Here the focus is not on the mean values, but on what happens in the tails of the distributions. Looking at the physical causes of inhomogeneities, one would expect that many of them especially affect the tails of the distributions. Likewise the IPCC AR4 report warns that changes in extremes are often more sensitive to inhomogeneous climate monitoring practices than changes in the mean.

Tuesday, January 10, 2012

New article: Benchmarking homogenisation algorithms for monthly data

The main paper of the COST Action HOME on homogenisation of climate data has been published today in Climate of the Past. This post describes shortly the problem of inhomogeneities in climate data and how such data problems are corrected by homogenisation. The main part explains the topic of the paper, a new blind validation study of homogenisation algorithms for monthly temperature and precipitation data. All the most used and best algorithms participated.

Inhomogeneities

To study climatic variability the original observations are indispensable, but not directly usable. Next to real climate signals they may also contain non-climatic changes. Corrections to the data are needed to remove these non-climatic influences, this is called homogenisation. The best known non-climatic change is the urban heat island effect. The temperature in cities can be warmer than on the surrounding country side, especially at night. Thus as cities grow, one may expect that temperatures measured in cities become higher. On the other hand, many stations have been relocated from cities to nearby, typically cooler, airports. Other non-climatic changes can be caused by changes in measurement methods. Meteorological instruments are typically installed in a screen to protect them from direct sun and wetting. In the 19th century it was common to use a metal screen on a North facing wall. However, the building may warm the screen leading to higher temperature measurements. When this problem was realised the so-called Stevenson screen was introduced, typically installed in gardens, away from buildings. This is still the most typical weather screen with its typical double-louvre door and walls. Nowadays automatic weather stations, which reduce labor costs, are becoming more common; they protect the thermometer by a number of white plastic cones. This necessitated changes from manually recorded liquid and glass thermometers to automated electrical resistance thermometers, which reduces the recorded temperature values.



One way to study the influence of changes in measurement techniques is by making simultaneous measurements with historical and current instruments, procedures or screens. This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri screen, in use in Spain and many European countries in the late 19th century and early 20th century. In the middle, Stevenson screen equipped with automatic sensors. Leftmost, Stevenson screen equipped with conventional meteorological instruments.
Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.


A further example for a change in the measurement method is that the precipitation amounts observed in the early instrumental period (about before 1900) are biased and are 10% lower than nowadays because the measurements were often made on a roof. At the time, instruments were installed on rooftops to ensure that the instrument is never shielded from the rain, but it was found later that due to the turbulent flow of the wind on roofs, some rain droplets and especially snow flakes did not fall into the opening. Consequently measurements are nowadays performed closer to the ground.