monthly climate data

Showing posts with label monthly climate data. Show all posts

Friday, 22 January 2021

New paper: Spanish and German climatologists on how to remove errors from observed climate trends

This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri (French) screen, in use in Spain and many European countries in the late 19th century and early 20th century. Leftmost, Stevenson screen equipped with conventional meteorological instruments, a set-up used globally for most of the 20th century. In the middle, Stevenson screen equipped with automatic sensors. The Montsouri screen is better ventilated, but because some solar radiation can get onto the thermometer it registers somewhat higher temperatures than a Stevenson screen. Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.

The instrumental climate record is human cultural heritage, the product of the diligent work of many generations of people all over the world. But changes in the way temperature was measured and in the surrounding of weather stations can produce spurious trends. An international team, with participation of the University Rovira i Virgili (Spain), State Meteorological Agency (AEMET, Spain) and University of Bonn (Germany), has made a great endeavour to provide reliable tests for the methods used to computationally eliminate such spurious trends. These so-called “homogenization methods“ are a key step to turn the enormous effort of the observers into accurate climate change data products. The results have been published in the prestigious Journal of Climate of the American Meteorological Society. The research was funded by the Spanish Ministry of Economy and Competitiveness.

Climate observations often go back more than a century, to times before we had electricity or cars. Such long time spans make it virtually impossible to keep the measurement conditions the same across time. The best-known problem is the growth of cities around urban weather stations. Cities tend to be warmer, for example due to reduced evaporation by plants or because high buildings block cooling. This can be seen comparing urban stations with surrounding rural stations. It is less talked about, but there are similar problems due to the spread of irrigation.

The most common reason for jumps in the observed data are relocations of weather stations. Volunteer observers tend to make observations near their homes; when they retire and a new volunteer takes over the tasks, this can produce temperature jumps. Even for professional observations keeping the locations the same over centuries can be a challenge either due to urban growth effects making sites unsuitable or organizational changes leading to new premises. Climatologist from Bonn, Dr. Victor Venema, one of the authors: “a quite typical organizational change is that weather offices that used to be in cities were transferred to newly build airports needing observations and predictions. The weather station in Bonn used to be on a field in village Poppelsdorf, which is now a quarter of Bonn and after several relocations the station is currently at the airport Cologne-Bonn.”

For global trends, the most important changes are technological changes of the same kinds and with similar effects all over the world. Now we are, for instance, in a period with widespread automation of the observational networks.

Appropriate computer programs for the automatic homogenization of climatic time series are the result of several years of development work. They work by comparing nearby stations with each other and looking for changes that only happen in one of them, as opposed to climatic changes that influence all stations.

To scrutinize these homogenization methods the research team created a dataset that closely mimics observed climate datasets including the mentioned spurious changes. In this way, the spurious changes are known and one can study how well they are removed by homogenization. Compared to previous studies, the testing datasets showed much more diversity; real station networks also show a lot of diversity due to differences in their management. The researchers especially took care to produce networks with widely varying station densities; in a dense network it is easier to see a small spurious change in a station. The test dataset was larger than ever containing 1900 station networks, which allowed the scientists to accurately determine the differences between the top automatic homogenization methods that have been developed by research groups from Europe and the Americas. Because of the large size of the testing dataset, only automatic homogenization methods could be tested.

The international author group found that it is much more difficult to improve the network-mean average climate signals than to improve the accuracy of station time series.

The Spanish homogenization methods excelled. The method developed at the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain, by Hungarian climatologist Dr. Peter Domonkos was found to be the best at homogenizing both individual station series and regional network mean series. The method of the State Meteorological Agency (AEMET), Unit of Islas Baleares, Palma, Spain, developed by Dr. José A. Guijarro was a close second.

When it comes to removing systematic trend errors from many networks, and especially of networks where alike spurious changes happen in many stations at similar dates, the homogenization method of the American National Oceanic and Atmospheric Agency (NOAA) performed best. This is a method that was designed to homogenize station datasets at the global scale where the main concern is the reliable estimation of global trends.

The earlier used Open Screen used at station Uccle in Belgium, with two modern closed thermometer Stevenson screens with a double-louvred walls in the background.

Quotes from participating researchers

Dr. Peter Domonkos, who earlier was a weather observer and now writes a book about time series homogenization: “This study has shown the value of large testing datasets and demonstrates another reason why automatic homogenization methods are important: they can be tested much better, which aids their development.”

Prof. Dr. Manola Brunet, who is the director of the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain, Visiting Fellow at the Climatic Research Unit, University of East Anglia, Norwich, UK and Vice-President of the World Meteorological Services Technical Commission said: “The study showed how important dense station networks are to make homogenization methods powerful and thus to compute accurate observed trends. Unfortunately, still a lot of climate data needs to be digitized to contribute to an even better homogenization and quality control.”

Dr. Javier Sigró from the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain: “Homogenization is often a first step that allows us to go into the archives and find out what happened to the observations that produced the spurious jumps. Better homogenization methods mean that we can do this in a much more targeted way.”

Dr. José A. Guijarro: “Not only the results of the project may help users to choose the method most suited to their needs; it also helped developers to improve their software showing their strengths and weaknesses, and will allow further improvements in the future.”

Dr. Victor Venema: “In a previous similar study we found that homogenization methods that were designed to handle difficult cases where a station has multiple spurious jumps were clearly better. Interestingly, this study did not find this. It may be that it is more a matter of methods being carefully fine-tuned and tested.”

Dr. Peter Domonkos: “The accuracy of homogenization methods will likely improve further, however, we never should forget that the spatially dense and high quality climate observations is the most important pillar of our knowledge about climate change and climate variability.”

Press releases

Spanish weather service, AEMET: Un equipo internacional de climatólogos estudia cómo minimizar errores en las tendencias climáticas observadas

URV university in Tarragona, Catalonian: Un equip internacional de climatòlegs estudia com es poden minimitzar errades en les tendències climàtiques observades

URV university, Spanish: Un equipo internacional de climatólogos estudia cómo se pueden minimizar errores en las tendencias climáticas observadas

URV university, English: An international team of climatologists is studying how to minimise errors in observed climate trends

Articles

Tarragona 21: Climatòlegs de la URV estudien com es poden minimitzar errades en les tendències climàtiques observades

Genius Science, French: Une équipe de climatologues étudie comment minimiser les erreurs dans la tendance climatique observée

Phys.org: A team of climatologists is studying how to minimize errors in observed climate trend

Monday, 12 October 2020

The deleted chapter of the WMO Guidance on the homogenisation of climate station data

The Task Team on Homogenization (TT-HOM) of the Open Panel of CCl Experts on Climate Monitoring and Assessment (OPACE-2) of the Commission on Climatology (CCl) of the World Meteorological Organization (WMO) has published their Guidance on the homogenisation of climate station data.

The guidance report was a bit longish, so at the end we decided that the last chapter on "Future research & collaboration needs" was best deleted. As chair of the task team and as someone who likes tp dream about what others could do in a comfy chair, I wrote most of this chapter and thus we decided to simply make it a blog post for this blog. Enjoy.

Introduction

This guidance is based on our current best understanding of inhomogeneities and homogenisation. However, writing it also makes clear there is a need for a better understanding of the problems.

A better mathematical understanding of statistical homogenisation is important because that is what most of our work is based on. A stronger mathematical basis is a prerequisite for future methodological improvements.

A stronger focus on a (physical) understanding of inhomogeneities would complement and strengthen the statistical work. This kind of work is often performed at the station or network level, but also needed at larger spatial scales. Much of this work is performed using parallel measurements, but they are typically not internationally shared.

In an observational science the strength of the outcomes depends on a consilience of evidence. Thus having evidence on inhomogeneities from both statistical homogenisation and physical studies strengthens the science.

This chapter will discuss the needs for future research on homogenisation grouped in five kinds of problems. In the first section we will discuss research on improving our physical understanding and physics-based corrections. The next section is about break detection, especially about two fundamental problems in statistical homogenisation: the inhomogeneous-reference problem and the multiple-breakpoint problem.

Next write about computing uncertainties in trends and long-term variability estimates from homogenised data due to remaining inhomogeneities. It may be possible to improve correction methods by treating it as a statistical model selection problem. The last section discusses whether inhomogeneities are stochastic or deterministic and how that may affect homogenisation and especially correction methods for the variability around the long-term mean.

For all the research ideas mentioned below, it is understood that in future we should study more meteorological variables than temperature. In addition, more studies on inhomogeneities across variables could be helpful to understand the causes of inhomogeneities and increase the signal to noise ratio. Homogenisation by national offices has advantages because here all climate elements from one station are stored together. This helps in understanding and identifying breaks. It would help homogenisation science and climate analysis to have a global database for all climate elements, like iCOADS for marine data. A Copernicus project has started working on this for land station data, which is an encouraging development.

Physical understanding

It is a good scientific practice to perform parallel measurements in order to manage unavoidable changes and to compare the results of statistical homogenisation to the expectations given the cause of the inhomogeneity according to the metadata. This information should also be analysed on continental and global scales to get a better understanding of when historical transitions took place and to guide homogenisation of large-scale (global) datasets. This requires more international sharing of parallel data and standards on the reporting of the size of breaks confirmed by metadata.

The Dutch weather service KNMI published a protocol how to manage possible future changes of the network, who decides what needs to be done in which situation, what kind of studies should be made, where the studies should be published and that the parallel data should be stored in their central database as experimental data. A translation of this report will soon be published by the WMO (Brandsma et al., 2019) and will hopefully inspire other weather services to formalise their network change management.

Next to statistical homogenisation, making and studying parallel measurements, and other physical estimates, can provide a second line of evidence on the magnitude of inhomogeneities. Having multiple lines of evidence provides robustness to observational sciences. Parallel data is especially important for the large historical transitions that are most likely to produce biases in network-wide to global climate datasets. It can validate the results of statistical homogenisation and be used to estimate possibly needed additional adjustments. The Parallel Observations Science Team of the International Surface Temperature Initiative (ISTI-POST) is working on building such a global dataset with parallel measurements.

Parallel data is especially suited to improve our physical understand of the causes of inhomogeneities by studying how the magnitude of the inhomogeneity depends on the weather and on instrumental design characteristics. This understanding is important for more accurate corrections of the distribution, for realistic benchmarking datasets to test our homogenisation methods and to determine which additional parallel experiments are especially useful.

Detailed physical models of the measurement, for example, the flow through the screens, radiative transfer and heat flows, can also help gain a better understanding of the measurement and its error sources. This aids in understanding historical instruments and in designing better future instruments. Physical models will also be paramount for understanding the impact of the surrounding on the measurement — nearby obstacles and surfaces influencing error sources and air flow — to changes in the measurand, such as urbanisation/deforestation or the introduction of irrigation. Land-use changes, especially urbanisation, should be studied together with relocations they may provoke.

Break detection

Longer climate series typically contain more than one break. This so-called multiple-breakpoint problem is currently an important research topic. A complication of relative homogenisation is that also the reference stations can have inhomogeneities. This so-called inhomogeneous-reference problem is not optimally solved yet. It is also not clear what temporal resolution is best for detection and what the optimal way is to handle the seasonal cycle in the statistical properties of climate data and of many inhomogeneities.

For temperature time series about one break per 15 to 20 years is typical and multiple breaks are thus common. Unfortunately, most statistical detection methods have been developed for one break and for the null hypothesis of white (sometimes red) noise. In case of multiple breaks the statistical test should not only take the noise variance into account, but also the break variance from breaks at other positions. For low signal to noise ratios, the additional break variance can lead to spurious detections and inaccuracies in the break position (Lindau and Venema, 2018a).

To apply single-breakpoint tests on series with multiple breaks, one ad-hoc solution is to first split the series at the most significant break (for example, the standard normalised homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution. Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably. This method is not used much nowadays.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. This can be done in a hypothesis testing (MASH) or in a statistical model selection framework. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011b). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.

This penalty function of PRODIGE was found to be suboptimal (Lindau and Venema, 2013). It was found that the penalty should be a function of the number of breaks, not fixed per break and that the relation with the length of the series should be reversed. It is not clear yet how sensitive homogenisation methods respond to this, but increasing the penalty per break in case of low SNR to reduce the number of breaks does not make the estimated break signal more accurate (Lindau and Venema, 2018a).

Not only the candidate station, also the reference stations will have inhomogeneities, which complicates homogenisation. Such inhomogeneities can be climatologically especially important when they are due to network-wide technological transitions. An example of such a transition is the current replacement of temperature observations using Stevenson screens by automatic weather stations. Such transitions are important periods as they may cause biases in the network and global average trends and they produce many breaks over a short period.

A related problem is that sometimes all stations in a network have a break at the same date, for example, when a weather service changes the time of observation. Nationally such breaks are corrected using metadata. If this change is unknown in global datasets one can still detect and correct such inhomogeneities statistically by comparison with other nearby networks. That would require an algorithm that additionally knows which stations belong to which network and prioritizes correcting breaks found between stations in different networks. Such algorithms do not exist yet and information on which station belongs to which network for which period is typically not internationally shared.

The influence of inhomogeneities in the reference can be reduced by computing composite references over many stations, removing reference stations with breaks and by performing homogenisation iteratively.

A direct approach to solving this problem would be to simultaneously homogenise multiple stations, also called joint detection. A step in this direction are pairwise homogenisation methods where breaks are detected in the pairs. This requires an additional attribution step, which attributes the breaks to a specific station. Currently this is done by hand (for PRODIGE; Caussinus and Mestre, 2004; Rustemeier et al., 2017) or with ad-hoc rules (by the Pairwise homogenisation algorithm of NOAA; Menne and Williams, 2009).

In the homogenisation method HOMER (Mestre et al., 2013) a first attempt is made to homogenise all pairs simultaneously using a joint detection method from bio-statistics. Feedback from first users suggests that this method should not be used automatically. It should be studied how good this methods works and where the problems come from.

Multiple breakpoint methods are more accurate as single breakpoint methods. This expected higher accuracy is founded on theory (Hawkins, 1972). In addition, in the HOME benchmarking study it was numerically found that modern homogenisation methods, which take the multiple breakpoint and the inhomogeneous reference problems into account, are about a factor two more accurate as traditional methods (Venema et al., 2012).

However, the current version of CLIMATOL applies single-breakpoint detection tests, first SNHT detection on a window then splitting, to achieve results comparable to modern multiple-breakpoint methods with respect to break detection and homogeneity of the data (Killick, 2016). This suggests that the multiple-breakpoint detection principle may not be as important as previously thought and warrants deeper study or the accuracy of CLIMATOL is partly due to an unknown unknown.

The signal to noise ratio is paramount for the reliable detection of breaks. It would thus be valuable to develop statistical methods that explain part of the variance of a difference time series and remove this to see breaks more clearly. Data from (regional) reanalysis could be useful predictors for this.

First methods have been published to detect breaks for daily data (Toreti et al., 2012; Rienzner and Gandolfi, 2013). It has not been studied yet what the optimal resolution for breaks detection is (daily, monthly, annual), nor what the optimal way is to handle the seasonal cycle in the climate data and exploit the seasonal cycle of inhomogeneities. In the daily temperature benchmarking study of Killick (2016) most non-specialised detection methods performed better than the daily detection method MAC-D (Rienzner and Gandolfi, 2013).

The selection of appropriate reference stations is a necessary step for accurate detection and correction. Many different methods and metrics are used for the station selection, but studies on the optimal method are missing. The knowledge of local climatologists which stations have a similar regional climate needs to be made objective so that it can be applied automatically (at larger scales).

For detection a high signal to noise ratio is most important, while for correction it is paramount that all stations are in the same climatic region. Typically the same networks are used for both detection and correction, but it should be investigated whether a smaller network for correction would be beneficial. Also in general, we need more research on understanding the performance of (monthly and daily) correction methods.

Computing uncertainties

Also after homogenisation uncertainties remain in the data due to various problems: Not all breaks in the candidate station have been and can be detected.
False alarms are an unavoidable trade-off for detecting many real breaks.
Uncertainty in the estimation of correction parameters due to limited data.
Uncertainties in the corrections due to limited information on the break positions.

From validation and benchmarking studies we have a reasonable idea about the remaining uncertainties that one can expect in the homogenised data, at least with respect to changes in the long-term mean temperature. For many other variables and changes in the distribution of (sub-)daily temperature data individual developers have validated their methods, but systematic validation and comparison studies are still missing.

Furthermore, such studies only provide a general uncertainty level, whereas more detailed information for every single station/region and period would be valuable. The uncertainties will strongly depend on the signal to noise ratios, on the statistical properties of the inhomogeneities of the raw data and on the quality and cross-correlations of the reference stations. All of which vary strongly per station, region and period.

Communicating such a complicated errors structure, which is mainly temporal, but also partially spatial, is a problem in itself. Furthermore, not only the uncertainty in the means should be considered, but, especially for daily data, uncertainties in the complete probability density function need to be estimated and communicated. This could be communicated with an ensemble of possible realisations, similar to Brohan et al. (2006).

An analytic understanding of the uncertainties is important, but is often limited to idealised cases. Thus also numerical validation studies, such as the past HOME and upcoming ISTI studies are important for an assessment of homogenisation algorithms under realistic conditions.

Creating validation datasets also help to see the limits of our understanding of the statistical properties of the break signal. This is especially the case for variables other than temperature and for daily and (sub-)daily data. Information is needed on the real break frequencies and size distributions, but also their auto-correlations and cross-correlations, as well as explained in the next section the stochastic nature of breaks in the variability around the mean.

Validation studies focussed on difficult cases would be valuable for a better understanding. For example, sparse networks, isolated island networks, large spatial trend gradients and strong decadal variability in the difference series of nearby stations (for example, due to El Nino in complex mountainous regions).

The advantage of simulated data is that it can create a large number of quite realistic complete networks. For daily data it will remain hard for the years to come to determine how to generate a realistic validation dataset. Thus even if using parallel measurements is mostly limited to one break per test, it does provide the highest degree of realism for this one break.

Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenisation. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

The physics of the problem suggests that many inhomogeneities are caused by stochastic processes. An example affecting many instruments are differences in the response time of instruments, which can lead to differences determined by turbulence. A fast thermometer will on average read higher maximum temperatures than a slow one, but this difference will be variable and sometimes be much higher than the average. In case of errors due to insolation the radiation error will be modulated by clouds. An insufficiently shielded thermometer will need larger corrections on warm days, which will typically be more sunny, but some warm days will be cloudy and not need much correction, while other warm days are sunny and calm and have a dry hot surface. The adjustment of daily data for studies on changes in the variability is thus a distribution problem and not only a regression bias-correction problem. For data assimilation (numerical weather prediction) accurate bias correction (with regression methods) is probably the main concern.

Seen as a variability problem, the correction of daily data is similar to statistical downscaling in many ways. Both methodologies aim to produce bias-corrected data with the right variability, taking into account the local climate and large-scale circulation. One lesson from statistical downscaling is that increasing the variance of a time series deterministically by multiplication with a fraction, called inflation, is the wrong approach and that the variance that could not be explained by regression using predictors should be added stochastically as noise instead (Von Storch, 1999). Maraun (2013) demonstrated that the inflation problem also exists for the deterministic Quantile Matching method, which is also used in daily homogenisation. Current statistical correction methods deterministically change the daily temperature distribution and do not stochastically add noise.

Transferring ideas from downscaling to daily homogenisation is likely fruitful to develop such stochastic variability correction methods. For example, predictor selection methods from downscaling could be useful. Both fields require powerful and robust (time invariant) predictors. Multi-site statistical downscaling techniques aim at reproducing the auto- and cross-correlations between stations (Maraun et al., 2010), which may be interesting for homogenisation as well.

The daily temperature benchmarking study of Rachel Killick (2016) suggests that current daily correction methods are not able to improve the distribution much. There is a pressing need for more research on this topic. However, these methods likely also performed less well because they were used together with detection methods with a much lower hit rate than the comparison methods.

The deterministic correction methods may not lead to severe errors in homogenisation, that should still be studied, but stochastic methods that implement the corrections by adding noise would at least theoretically fit better to the problem. Such stochastic corrections are not trivial and should have the right variability on all temporal and spatial scales.

It should be studied whether it may be better to only detect the dates of break inhomogeneities and perform the analysis on the homogeneous subperiods (removing the need for corrections). The disadvantage of this approach is that most of the trend variance is in the difference in the mean of the HSPs and only a small part is in the trend within the HPSs. In case of trend analysis, this would be similar to the work of the Berkeley Earth Surface Temperature group on the mean temperature signal. Periods with gradual inhomogeneities, e.g., due to urbanisation, would have to be detected and excluded from such an analysis.

An outstanding problem is that current variability correction methods have only been developed for break inhomogeneities, methods for gradual ones are still missing. In homogenisation of the mean of annual and monthly data, gradual inhomogeneities are successfully removed by implementing multiple small breaks in the same direction. However, as daily data is used to study changes in the distribution, this may not be appropriate for daily data as it could produce larger deviations near the breaks. Furthermore, changing the variance in data with a trend can be problematic (Von Storch, 1999).

At the moment most daily correction methods correct the breaks one after another. In monthly homogenisation it is found that correcting all breaks simultaneously (Caussinus and Mestre, 2004) is more accurate (Domonkos et al., 2013). It is thus likely worthwhile to develop multiple breakpoint correction methods for daily data as well.

Finally, current daily correction methods rely on previously detected breaks and assume that the homogeneous subperiods (HSP) are homogeneous (i.e., each segment between breakpoints assume to be homogeneous) . However, these HSP are currently based on detection of breaks in the mean only. Breaks in higher moments may thus still be present in the "homogeneous" sub periods and affect the corrections. If only for this reason, we should also work on detection of breaks in the distribution.

Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 40 for decile corrections applied to every season, to a large number of DOF for quantile or percentile matching.

A study using PRODIGE on the HOME benchmark suggested that for typical European networks monthly adjustment are best for temperature; annual corrections are probably less accurate because they fail to account for changes in seasonal cycle due to inhomogeneities. For precipitation annual corrections were most accurate; monthly corrections were likely less accurate because the data was too noisy to estimate the 12 correction constants/degrees of freedom.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean could be sufficient, for a serious exposure problem (e.g., insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted. The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighbouring reference stations are.

An entire regional network is typically homogenised with the same correction method, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station, from break to break and from period to period. Work on correction methods that objectively select the optimal correction method, e.g., using an information criterion, would be valuable.

In case of (sub-)daily data, the options to select from become even larger. Daily data can be corrected just for inhomogeneities in the mean (e.g., Vincent et al., 2002, where daily temperatures are corrected by incorporating a linear interpolation scheme that preserves the previously defined monthly corrections) or also for the variability around the mean. In between are methods that adjust for the distribution including the seasonal cycle, which dominates the variability and is thus effectively similar to mean adjustments with a seasonal cycle. Correction methods of intermediate complexity with more than one, but less than 10 degrees of freedom would fill a gap and allow for more flexibility in selecting the optimal correction model.

When applying these methods (Della-Marta and Wanner, 2006; Wang et al., 2010; Mestre et al., 2011; Trewin, 2013) the number of quantile bins (categories) needs to be selected as well as whether to use physical weather-dependent predictors and the functional form they are used (Auchmann and Brönnimann, 2012). Objective optimal methods for these selections would be valuable.

Related information

WMO Guidelines on Homogenization (English, French, Spanish)

WMO guidance report: Challenges in the Transition from Conventional to Automatic Meteorological Observing Networks for Long-term Climate Records

Wednesday, 17 February 2016

The global warming conspiracy would be huge

The concept of global warming was created by and for the Chinese in order to make US manufacturing non-competitive.
Republican front runner Donald Trump on Twitter

Snowing in Texas and Louisiana, record setting freezing temperatures throughout the country and beyond. Global warming is an expensive hoax!
Republican front runner Donald Trump on Twitter

How do you know the climate didn't actually cool?
Eric Worrall, the main contributor to WUWT

Why use discredited surface data which everyone knows is fraudulent?
"Scottish" "Sceptic"

I am working on a study to compare nationally homogenized temperature data with the temperatures in the large international collections (GHCN, CRUTEM, etc.). Looking for such national datasets, I found many graphs in the scientific literature showing national temperature increases, which I want to share with you.

Mitigation skeptics like to talk about "The Team", as if a small group of people would be "in charge". That makes their conspiracy theories a little less absurd, although even small conspiracies typically do not last for decades. The national temperature series show that hundreds of national weather services and numerous universities would also need to be in the conspiracy of science against mankind. To me that sounds unrealistic.

The mitigation skeptics have a rough time and nowadays more often claim that they do not dispute the greenhouse effect or the warming of the Earth at all, but only bla, bla, bla. Which is why I thought I would show that this post is not fighting strawmen by citing some of the main bloggers and political leaders of the mitigation skeptical movement at the top of this post.

Anthony Watts, the weather presenter hosting Watts Up With That (WUWT), typically claims that only half of the warming is real, although he recently softened his stance for the USA and now only claims that a third is not real. If half of the warming in the global collections were not real, many scientists would have noticed that the global data does not fit to their local observations.

Plot idea: 97% of the world's scientists contrive an environmental crisis, but are exposed by a plucky band of billionaires & oil companies.
Scott Westerfeld

And do not forget all the other scientists studying other parts of the climate system, the upper air, ground temperatures, sea surface temperature, ocean heat content, precipitation, glaciers, ice sheets, lake temperatures, sea ice, lake and river freezing, snow, birds, plants, insects, agriculture. One really wonders with Eric Worrall how on Earth science knows the climate didn't actually cool.

Another reason to write this post is to ask for help. For this comparison study, I have datasets or first contacts for the countries below. If you know of more homogenized datasets, please, please let me know. Even if it is "only" a reference. Also if you have a dataset from one of the countries below: multiple datasets from one country are very much welcome.

Countries: Albania, Argentina, Armenia, Australia, Austria, Belgium, Benin, Bolivia, Bulgaria, Canada, Chile, China, Congo Brazzaville, Croatia, Czech Republic, Denmark, Ecuador, Estonia, Finland, France, Germany, Greece, Hungary, Iran, Israel, Italy, Latvia, Libya, Macedonia, Morocco, Netherlands, New Zealand, Norway, Peru, Philippines, Portugal, Romania, Russia, Serbia, Slovakia, Slovenia, South Africa, Spain, Sweden, Switzerland, Tanzania, Uganda, Ukraine, United Kingdom, United States of America.
Regions: Catalonia, Carpathian basin, Central England Temperature, Greater Alpine Region.

Alpine region

The temperature for the Greater Alpine Region from the HISTALP project (Ingeborg Auer and colleagues, 2007). The lower panel shows the temperature for four low altitude regions. The top panel their average (black) and the signal for the high altitude stations (grey). All series are smoothed over 10 years.

Armenia

The increase in the annual temperatures and the decrease in annual precipitation in Armenia. From Levon Vardanyan and colleagues (2013), see also Artur Gevorgyan and colleagues (2016).

Australia

The temperature signal over Australia for the day-time maximum temperature (red), the mean temperature (green) and the night-time minimum temperature. Figure from Fawcett and colleagues (2012).

Canada

From Lucie Vincent of Environment Canada and colleagues (2012).

The Czech Republic

Changes in mean annual and seasonal temperature time series for the Czech Lands in the period 1800–2010. The part of series calculated from only two stations is expressed by a dashed line. Figure by Petr Stepanek of the Global Change Research Institute CAS, Brno, Czech Republic.

China

The temperature change in China over the last 106 years, the annual mean temperature and the seasonal temperatures from QingXiang Li and colleagues (2010).

England

The famous Central England Temperature series of the Hadley Centre.

Finland

The annual average temperature in Finland. National averages are more noisy than global averages. Thus to show the trend better the graph adds the decadal average temperature. From Mikkonen and colleagues (2015).

Gambia

The mean, maximum and minimum temperature from Yundum Meteorological Station in Gambia. From a journal I normally do not read "Primate Biology". From Hillyer and colleagues (2015).

India

The temperature signal since 1900 in India according to Kothawale et al. (2010) of the Indian Institute of Tropical Meteorology (IITM), Pune.

Italy

The temperature series of Italy since 1800 according to Michele Brunetti and colleagues (2006).

Middle America and Northern South America

These graphs show the change in the number of warm days (maximum temperature) and warm night (minimum temperature) and the number of cold days and cold nights computed from daily data from several countries in Middle America and in the North of South America. Figures from Enric Aguilar and colleagues (2005).

The Netherlands

Annual mean temperatures of the actual observations at De Bilt (red), the De Bilt homogenised series (dark blue), the previous version of the Central Netherlands Temperature series (CNT2,7; light blue) and the current Central Netherlands Temperature series (CNT4,6; pink). Gerard van der Schrier and colleagues (2009) from the Dutch weather service, KNMI. De Bilt is a city in the middle of The Netherlands were the KNMI main office is. The Central Netherlands series is for a larger region in the middle of The Netherlands.

New Zealand

The famous New Zealand 7-stations series.

Philippines

Observed annual mean temperature anomalies in the Philippines during the period 1951–2010 computed by Thelma A. Cincoa and colleagues (2014).

Russia

Temperature averaged over Russia from the annual climate report of ROSHYDROMET (2014). The top panel are the annual averages, the four lower panels the seasons (winter, spring, summer and autumn). No homogenization.

The variability in winter is very high. According to mitigation sceptic Anthony Watts this is due to Russian Steam Pipes:

I do know this: neither I nor NOAA has a good handle on the siting characteristics of Russian weather stations. I do know one thing though, the central heating schemes for many Russian cities puts a lot of waste heat into the air from un-insulated steam pipes.

Then it would be surprising that such large regions are affected in the same way and that the steam pipe years are also hot in the analysis of global weather prediction models and satellite temperature datasets.

Spain

Temperature trends computed by José Antonio Guijaro (2015) of the Spanish State Meteorological Agency (AEMET) for 12 river catchments within Spain. Homogenization with CLIMATOL.

The Spanish temperature dataset of the URV University in Tarragona. The panels on the left show the minimum temperature, the panels on the right the maximum temperature. The top panels show raw data before homogenization, the lower panels the homogenized data. The maximum temperature before 1910 had to be corrected strongly because of the use of a French screen before this time.

United States of America

The minimum and maximum temperature of the lower 48 states of the United States of America computed by NOAA. You can see it is an original American-made graph because it is in [[Fahrenheit]].

Switzerland

The temperature signal in Switzerland computed by Michael Begert and colleagues of the MeteoSchweiz. The top panel show original station time series, the lower panel shows them after removal of non-climatic changes.

References

Aguilar, E., et al., 2005: Changes in precipitation and temperature extremes in Central America and northern South America, 1961–2003. Journal Geophysical Research, 110, D23107, doi: 10.1029/2005JD006119.

Auer, I., Böhm, R., Jurkovic, A., Lipa, W., Orlik, A., Potzmann, R., Schöner, W., Ungersböck, M., Matulla, C., Briffa, K., Jones, P., Efthymiadis, D., Brunetti, M., Nanni, T., Maugeri, M., Mercalli, L., Mestre, O., Moisselin, J.-M., Begert, M., Müller-Westermeier, G., Kveton, V., Bochnicek, O., Stastny, P., Lapin, M., Szalai, S., Szentimrey, T., Cegnar, T., Dolinar, M., Gajic-Capka, M., Zaninovic, K., Majstorovic, Z. and Nieplova, E., 2007: HISTALP—historical instrumental climatological surface time series of the Greater Alpine Region. International Journal of Climatology, 27, pp. 17–46. doi: 10.1002/joc.1377.

Begert, M., Schlegel, T. and Kirchhofer, W., 2005: Homogeneous temperature and precipitation series of Switzerland from 1864 to 2000. International Journal of Climatology, 25, pp. 65–80. doi: 10.1002/joc.1118.

Brunetti, M., Maugeri, M., Monti, F. and Nanni, T., 2006: Temperature and precipitation variability in Italy in the last two centuries from homogenised instrumental time series. International Journal of Climatology, 26, pp. 345–381, doi: 10.1002/joc.1251.

Cincoa, Thelma A., Rosalina G. de Guzmana, Flaviana D. Hilarioa, David M. Wilson, 2014: Long-term trends and extremes in observed daily precipitation and near surface air temperature in the Philippines for the period 1951–2010. Atmospheric Research, 145–146, pp. 12–26, j.atmosres.2014.03.025.

Fawcett, R.J.B., B.C. Trewin, K. Braganza, R.J Smalley, B. Jovanovic and D.A. Jones, 2012: On the sensitivity of Australian temperature trends and variability to analysis methods and observation networks. CAWCR Technical Report No. 050.

Gevorgyan, A., H. Melkonyan, T. Aleksanyan, A. Iritsyan and Y. Khalatyan, 2016: An assessment of observed and projected temperature changes in Armenia. Arabian Journal of Geosciences, 9, pp. 1-9, DOI 10.1007/s12517-015-2167-y.

Guijaro, J.A., 2015: Temperature trends. AEMET Report.

Hillyer, A.P., R. Armstrong, and A.H. Korstjens, 2015: Dry season drinking from terrestrial man-made watering holes in arboreal wild Temminck’s red colobus, The Gambia. Primate Biol., 2, pp. 21–24, doi: 10.5194/pb-2-21-2015.

Jain, Sharad K. and Vijay Kumar, 2012: Trend analysis of rainfall and temperature data for India. Current Science, 102.

Kothawale, D.R., A.A. Munot, K. Krishna Kumar, 2010: Surface air temperature variability over India during 1901–2007, and its association with ENSO. Climate Research, 42, pp. 89-104.

Li Q.X., Dong W.J., Li W., et al., 2010: Assessment of the uncertainties in temperature change in China during the last century. Chinese Science Bulletin, 55, pp. 1974−1982, doi: 10.1007/s11434-010-3209-1

Mikkonen, S., M. Laine, H.M. Mäkelä, H. Gregow, H. Tuomenvirta, M. Lahtinen, A. Laaksonen, 2015: Trends in the average temperature in Finland, 1847–2013. Stochastic Environmental Research and Risk Assessment, 29, Issue 6, pp 1521-1529, doi: 10.1007/s00477-014-0992-2.

Schrier, van der, G., A. van Ulden, and G.J. van Oldenborgh, 2011: The construction of a Central Netherlands temperature. Climate of the Past, 7, 527–542, doi: 10.5194/cp-7-527-2011

Ulden, van, Aad, Geert Jan van Oldenborgh, and Gerard van der Schrier, 2009: The Construction of a Central Netherlands Temperature. Scientific report, WR2009-03. See also Van der Schrier et al. (2011).

Vardanyan, L., H. Melkonyan, A. Hovsepyan, 2013: Current status and perspectives for development of climate services in Armenia. Report, ISBN 978-9939-69-050-6.

Vincent, L.A., X.L. Wang, E.J. Milewska, H. Wan, F. Yang, and V. Swail, 2012: A second generation of homogenized Canadian monthly surface air temperature for climate trend analysis. J. Geophys. Res., 117, D18110, doi: 10.1029/2012JD017859.

Wednesday, 8 October 2014

A framework for benchmarking of homogenisation algorithm performance on the global scale - Paper now published

By Kate Willett reposted from the Surface Temperatures blog of the International Surface Temperature Initiative (ISTI).

The ISTI benchmarking working group have just had their first benchmarking paper accepted at Geoscientific Instrumentation, Methods and Data Systems:

Willett, K., Williams, C., Jolliffe, I. T., Lund, R., Alexander, L. V., Brönnimann, S., Vincent, L. A., Easterbrook, S., Venema, V. K. C., Berry, D., Warren, R. E., Lopardo, G., Auchmann, R., Aguilar, E., Menne, M. J., Gallagher, C., Hausfather, Z., Thorarinsdottir, T., and Thorne, P. W.: A framework for benchmarking of homogenisation algorithm performance on the global scale, Geosci. Instrum. Method. Data Syst., 3, 187-200, doi:10.5194/gi-3-187-2014, 2014.

Benchmarking, in this context, is the assessment of homogenisation algorithm performance against a set of realistic synthetic worlds of station data where the locations and size/shape of inhomogeneities are known a priori. Crucially, these inhomogeneities are not known to those performing the homogenisation, only those performing the assessment. Assessment of both the ability of algorithms to find changepoints and accurately return the synthetic data to its clean form (prior to addition of inhomogeneity) has three main purposes:

1) quantification of uncertainty remaining in the data due to inhomogeneity
2) inter-comparison of climate data products in terms of fitness for a specified purpose
3) providing a tool for further improvement in homogenisation algorithms

Here we describe what we believe would be a good approach to a comprehensive homogenisation algorithm benchmarking system. Thfis includes an overarching cycle of: benchmark development; release of formal benchmarks; assessment of homogenised benchmarks and an overview of where we can improve for next time around (Figure 1).

Figure 1 Overview the ISTI comprehensive benchmarking system for assessing performance of homogenisation algorithms. (Fig. 3 of Willett et al., 2014)

There are four components to creating this benchmarking system.

Creation of realistic clean synthetic station data
Firstly, we must be able to synthetically recreate the 30000+ ISTI stations such that they have the correct variability, auto-correlation and interstation cross-correlations as the real data but are free from systematic error. In other words, they must contain a realistic seasonal cycle and features of natural variability (e.g., ENSO, volcanic eruptions etc.). There must be a realistic persistence month-to-month in each station and geographically across nearby stations.

Creation of realistic error models to add to the clean station data
The added inhomogeneities should cover all known types of inhomogeneity in terms of their frequency, magnitude and seasonal behaviour. For example, inhomogeneities could be any or a combination of the following:

- geographically or temporally clustered due to events which affect entire networks or regions (e.g. change in observation time);
- close to end points of time series;
- gradual or sudden;
- variance-altering;
- combined with the presence of a long-term background trend;
- small or large;
- frequent;
- seasonally or diurnally varying.

Design of an assessment system
Assessment of the homogenised benchmarks should be designed with the three purposes of benchmarking in mind. Both the ability to correctly locate changepoints and to adjust the data back to its homogeneous state are important. It can be split into four different levels:

- Level 1: The ability of the algorithm to restore an inhomogeneous world to its clean world state in terms of climatology, variance and trends.

- Level 2: The ability of the algorithm to accurately locate changepoints and detect their size/shape.

- Level 3: The strengths and weaknesses of an algorithm against specific types of inhomogeneity and observing system issues.

- Level 4: A comparison of the benchmarks with the real world in terms of detected inhomogeneity both to measure algorithm performance in the real world and to enable future improvement to the benchmarks.

The benchmark cycle
This should all take place within a well laid out framework to encourage people to take part and make the results as useful as possible. Timing is important. Too long a cycle will mean that the benchmarks become outdated. Too short a cycle will reduce the number of groups able to participate.

Producing the clean synthetic station data on the global scale is a complicated task that has now taken several years but we are close to completion of a version 1. We have collected together a list of known regionwide inhomogeneities and a comprehensive understanding of the many many different types of inhomogeneities that can affect station data. We have also considered a number of assessment options and decided to focus on levels 1 and 2 for assessment within the benchmark cycle. Our benchmarking working group is aiming for release of the first benchmarks by January 2015.

Tuesday, 26 November 2013

Are break inhomogeneities a random walk or a noise?

Tomorrow is the next conference call of the benchmarking and assessment working group (BAWG) of the International Surface Temperature Initiative (ISTI; Thorne et al., 2011). The BAWG will create a dataset to benchmark (validate) homogenization algorithm. It will mimic the real mean temperature data of the ISTI, but will include know inhomogeneities, so that we can assess how well the homogenization algorithms remove them. We are almost finished discussing how the benchmark dataset should be developed, but still need to fix some details. Such as the question: Are break inhomogeneities a random walk or a noise?

Previous studies

The benchmark dataset of the ISTI will be global and is also intended to be used to estimate uncertainties in the climate signal due to remaining inhomogeneities. These are the two main improvements over previous validation studies.

Williams, Menne, and Thorne (2012) validated the pairwise homogenization algorithm of NOAA on a dataset mimicking the US Historical Climate Network. The paper focusses on how well large-scale biases can be removed.

The COST Action HOME has performed a benchmarking of several small networks (5 to 19 stations) realistically mimicking European climate networks (Venema et al., 2012). It main aim was to intercompare homogenization algorithms, the small networks allowed HOME to also test manual homogenization methods.

These two studies were blind, in other words the scientists homogenizing the data did not know where the inhomogeneities were. An interesting coincidence is that the people who generated the blind benchmarking data were outsiders at the time: Peter Thorne for NOAA and me for HOME. This probably explains why we both made an error, which we should not repeat in the ISTI.

On the reactions to the doubling of the recent temperature trend by Curry, Watts and Lucia

The recent Cowtan and Way study, coverage bias in the HadCRUT4 temperature record, in the QJRMS showed that the temperature trend over the last 15 years is more than twice as strong as previously thought. [UPDATE: The paper can be read here it is now Open Access]

This created quite a splash in the blog-o-sphere; see my last post. This is probably no wonder. The strange idea that the global warming has stopped is one of the main memes of the climate ostriches and in the USA even of the main stream media. A recent media analysis showed that half of the reporting of the recent publication of the IPCC report pertained this meme.

This reporting is in stark contrast to the the IPCC having almost forgotten to write about it as it has little climatological significance. Also after the Cowtan and Way (2013) paper, the global temperature trend between 1880 and now is still about 0.8 degrees per century.

The global warming of the entire climate system is continuing without pause in the warming of the oceans. While the oceans are the main absorber of energy in the climate system. The atmospheric temperature increase only accounts for about 2 percent of the total. Because the last 15 years also just account for a short part of the anthropogenic warming period, one can estimate that the discussion is about less than one thousandths of the warming.

Reactions

The study was positively received by amongst others the Klimalounge (in German), RealClimate, Skeptical Science, Carbon Brief, QuakeRattled, WottsUpWithThatBlog, OurChangingClimate, Moyhu (Nick Stockes) and Planet 3.0. It is also discussed in the press: Sueddeutsche Zeitung, TAZ, Spiegel Online (three leading newspapers in Germany, in German), The Independent (4 articles), Mother Jones, Hürriyet (a large newspaper in Turkey) and Science Daily.

Lucia at The Blackboard wrote in her first post Cotwan and Way: Have they killed the pause? and stated: "Right now, I’m mostly liking the paper. The issues I note above are questions, but they do do quite a bit of checking". And Lucia wrote in her second post: "The paper is solid."

Furthermore, Steve Mosher writes: "I know robert [Way] does first rate work because we’ve been comparing notes and methods and code for well over a year. At one point we spent about 3 months looking at labrador data from enviroment canada and BEST. ... Of course, folks should double and triple check, but he’s pretty damn solid."

The main serious critical voice seems to be Judith Curry at Climate Etc. Her comments have been taken up by numerous climate ostrich blogs. This post discusses Curry's comments, which were also taken up by Lucia. And I will also include some erroneous additions by Antony Watts. And it will discuss one one additional point raised by Lucia.

Interpolation
UAH satellite analyses
Reanalyses
No contribution
Model validation
A hiatus in the satellite datasets (Black Board)

Temperature trend over last 15 years is twice as large as previously thought

UPDATED: Now with my response to Juddith Curry's comments and an interesting comment by Peter Thorne.

Yesterday a study appeared in the Quarterly Journal of the Royal Meteorological Society that suggests that the temperature trend over the last 15 years is about twice a large as previously thought. This study [UPDATE: Now Open Access] is by Kevin Cowtan and Robert G. Way and is called: "Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends".

The reason for the bias is that in the HadCRUT dataset, there is a gap in the Arctic and the study shows that it is likely that there was strong warming in this missing data region (h/t Stefan Rahmstorf at Klimalounge in German; the comments and answers by Rahmstorf there are also interesting and refreshingly civilized; might be worth reading the "translation"). In the HadCRUT4 dataset the temperature trend over the period 1997-2012 is only 0.05°C per decade. After filling the gap in the Arctic, the trend is 0.12 °C per decade.

The study starts with the observation that over the period 1997 to 2012 "GISTEMP, UAH and NCEP/NCAR [which have (nearly) complete global coverage and no large gap at the Arctic, VV] all show faster warming in the Arctic than over the planet as a whole, and GISTEMP and NCEP/NCAR also show faster warming in the Antarctic. Both of these regions are largely missing in the HadCRUT4 data. If the other datasets are right, this should lead to a cool bias due to coverage in the HadCRUT4 temperature series.".

Datasets

All datasets have their own strengths and weaknesses. The nice thing about this paper is how they combine the datasets and use the strengths and mitigate their weaknesses.

Surface data. Direct (in-situ) measurements of temperature (used in HadCRU and GISTEMP) are very important. Because they lend themselves well to homogenization, station data is temporal consistent and its trend are thus most reliable. Problems are that most observations were not performed with climate change in mind and the spatial gaps that are so important for this study.

Satellite data. Satellites perform indirect measurements of the temperature (UAH and RSS). Their main strengths are the global coverage and spatial detail. A problem for satellite datasets are that the computation of physical parameters (retrievals) needs simplified assumptions and that other (partially unknown) factors can influence the result. The temperature retrieval needs information on the surface, which is especially important in the Arctic. Another satellite temperature dataset by RSS therefore omits the Arctic from their dataset. UAH is also expected to have biases in the Arctic, but does provide data.

Statistically interesting problems: correction methods in homogenization

This is the last post in a series on five statistically interesting problems in the homogenization of climate network data. This post will discuss two problems around the correction methods used in homogenization. Especially the correction of daily data is becoming an increasingly important problem because more and more climatologist work with daily climate data. The main added value of daily data is that you can study climatic changes in the probability distribution, which necessitates studying the non-climatic factors (inhomogeneities) as well. This is thus a pressing, but also a difficult task.

The five main statistical problems are:

Problem 1. The inhomogeneous reference problem: Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem: A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties: We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem: We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?: Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 4. Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 120 for decile corrections (for the higher order moment method (HOM) for daily data, Della-Marta & Wanner, 2006) applied to every month, to a large number of DOF for quantile or percentile matching.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean would be sufficient, for a serious exposure problem (e.g. insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted.

The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighboring reference stations are.

Currently climatologists choose their correction method mainly subjectively. For precipitation annual correction are typically applied and for temperature monthly correction are typical. The HOME benchmarking study showed these are good choices. For example, an experimental contribution correcting precipitation on a monthly scale had a larger error as the same method applied on the annual scale because the data did not allow for an accurate estimation of 12 monthly correction constants.

One correction method is typically applied to the entire regional network, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station and from break to break. Especially in global studies, the number of stations in a region and thus the signal to noise ratio varies widely and one fixed choice is likely suboptimal. Studying which correction method is optimal for every break is much work for manual methods, instead we should work on automatic correction methods that objectively select the optimal correction method, e.g., using an information criterion. As far as I know, no one works on this yet.

Problem 5. Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenization. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

Special issue on homogenisation of climate series

The open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás" has just published a special issue on homogenization of climate records. This special issue contains eight research papers. It is an offspring of the COST Action HOME: Advances in homogenization methods of climate series: an integrated approach (COST-ES0601).

To be able to discuss eight papers, this post does not contain as much background information as usual and is aimed at people already knowledgeable about homogenization of climate networks.

Mónika Lakatos and Tamás Szentimrey: Editorial.: The editorial explains the background of this special issue: the importance of homogenisation and the COST Action HOME. Mónika and Tamás thank you very much for your efforts to organise this special issue. I think every reader will agree that it has become a valuable journal issue.

Monthly data

Ralf Lindau and Victor Venema: On the multiple breakpoint problem and the number of significant breaks in homogenization of climate records.: My article with Ralf Lindau is already discussed in a previous post on the multiple breakpoint problem.

José A. Guijarro: Climatological series shift test comparison on running windows.: Longer time series typically contain more than one inhomogeneity, but statistical tests are mostly designed to detect one break. One way to resolve this conflict is by applying these tests on short moving windows. José compares six statistical detection methods (t-test, Standard Normal Homogeneity Test (SNHT), two-phase regression (TPR), Wilcoxon-Mann-Whithney test, Durbin-Watson test and SRMD: squared relative mean difference), which are applied on running windows with a length between 1 and 5 years (12 to 60 values (months) on either side of the potential break). The smart trick of the article is that all methods are calibrated to a false alarm rate of 1% for better comparison. In this way, he can show that the t-test, SNHT and SRMD are best for this problem and almost identical. To get good detection rates, the window needs to be at least 2*3 years. As this harbours the risk of having two breaks in one window, José has decided to change his homogenization method CLIMATOL to using the semi-hierarchical scheme of SNHT instead of using windows. The methods are tested on data with just one break; it would have been interesting to also simulate the more realistic case with multiple independent breaks.

Olivier Mestre, Peter Domonkos, Franck Picard, Ingeborg Auer, Stéphane Robin, Emilie Lebarbier, Reinhard Böhm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klan-car, Brigitte Dubuisson, and Petr Stepanek: HOMER: a homogenization software – methods and applications.: HOMER is a new homogenization method and is developed using the best methods tested on the HOME benchmark. Thus theoretically, this should be the best method currently available. Still, sometimes interactions between parts of an algorithm can lead to unexpected results. It would be great if someone would test HOMER on the HOME benchmark dataset, so that we can compare its performance with the other algorithms.

New article on the multiple breakpoint problem in homogenization

An interesting paper by Ralf Lindau and me on the multiple breakpoint problem has just appeared in a Special issue on homogenization of the open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás".

Multiple break point problem

Long instrumental time series contain non-climatological changes, called inhomogeneities. For example because of relocations or due to changes in the instrumentation. To study real changes in the climate more accurately these inhomogeneities need to be detected and removed in a data processing step called homogenization; also called segmentation in statistics.

Statisticians have worked a lot on the detection of a single break point in data. However, unfortunately, long climate time series typically contain more than just one break point. There are two ad hoc methods to deal with this.

The most used method is the hierarchical one: to first detect the largest break and then to redo the detection on the two subsections, and so on until no more breaks are found or the segments become too short. A variant is the semi-hierachical method in which old detected breaks are retested and removed if no longer significant. For example, SNHT uses a semi-hierachical scheme and thus also the pairwise homogenization algorithm of NOAA, which uses SNHT for detection.

The second ad hoc method is to detect the breaks on a moving window. This window should be long enough for sensitivity, but should not be too long because that increases the chance of two breaks in the window. In the Special issue there is an article by José A. Guijarro on this method, which is used for his homogenization method CLIMATOL.

While these two ad hoc methods work reasonably, detecting all breaks simultaneously is more powerful. This can be performed as an exhaustive search of all possible combinations (used by the homogenization method MASH). With on average one break per 15 to 20 years, the number of breaks and thus combinations can get very large. Modern homogenization methods consequently use an optimization method called dynamic programming (used by the homogenization methods PRODIGE, ACMANT and HOMER).

All the mentioned homogenization methods have been compared with each other on a realistic benchmark dataset by the COST Action HOME. In the corresponding article (Venema et al., 2012) you can find references to all the mentioned methods. The results of this benchmarking showed that multiple breakpoint methods were clearly the best. However, this is not only because of the elegant solution to the multiple breakpoint problem, these methods also had other advantages.

Homogenisation of monthly and annual data from surface stations

To study climate change and variability long instrumental climate records are essential, but are best not used directly. These datasets are essential since they are the basis for assessing century-scale trends or for studying the natural (long-term) variability of climate, amongst others. The value of these datasets, however, strongly depends on the homogeneity of the underlying time series. A homogeneous climate record is one where variations are caused only by variations in weather and climate. In our recent article we wrote: “Long instrumental records are rarely if ever homogeneous”. A non-scientist would simply write: homogeneous long instrumental records do not exist. In practice there are always inhomogeneities due to relocations, changes in the surrounding, instrumentation, shelters, etc. If a climatologist only writes: “the data is thought to be of high quality” and then removes half of the data and does not mention the homogenisation method used, it is wise to assume that the data is not homogeneous.

Results from the homogenisation of instrumental western climate records indicate that detected inhomogeneities in mean temperature series occur at a frequency of roughly 15 to 20 years. It should be kept in mind that most measurements have not been specifically made for climatic purposes, but rather to meet the needs of weather forecasting, agriculture and hydrology (Williams et al., 2012). Moreover the typical size of the breaks is often of the same order as the climatic change signal during the 20^th century (Auer et al., 2007; Menne et al., 2009; Brunetti et al., 2006; Caussinus and Mestre; 2004, Della-Marta et al., 2004). Inhomogeneities are thus a significant source of uncertainty for the estimation of secular trends and decadal-scale variability.

If all inhomogeneities would be purely random perturbations of the climate records, collectively their effect on the mean global climate signal would be negligible. However, certain changes are typical for certain periods and occurred in many stations, these are the most important causes discussed below as they can collectively lead to artificial biases in climate trends across large regions (Menne et al., 2010; Brunetti et al., 2006; Begert et al., 2005).

In this post I will introduce a number of typical causes for inhomogeneities and methods to remove them from the data.

Pages