Showing posts with label daily climate data. Show all posts
Showing posts with label daily climate data. Show all posts

Monday, 12 October 2020

The deleted chapter of the WMO Guidance on the homogenisation of climate station data

The Task Team on Homogenization (TT-HOM) of the Open Panel of CCl Experts on Climate Monitoring and Assessment (OPACE-2) of the Commission on Climatology (CCl) of the World Meteorological Organization (WMO) has published their Guidance on the homogenisation of climate station data.

The guidance report was a bit longish, so at the end we decided that the last chapter on "Future research & collaboration needs" was best deleted. As chair of the task team and as someone who likes tp dream about what others could do in a comfy chair, I wrote most of this chapter and thus we decided to simply make it a blog post for this blog. Enjoy.

Introduction

This guidance is based on our current best understanding of inhomogeneities and homogenisation. However, writing it also makes clear there is a need for a better understanding of the problems.

A better mathematical understanding of statistical homogenisation is important because that is what most of our work is based on. A stronger mathematical basis is a prerequisite for future methodological improvements.

A stronger focus on a (physical) understanding of inhomogeneities would complement and strengthen the statistical work. This kind of work is often performed at the station or network level, but also needed at larger spatial scales. Much of this work is performed using parallel measurements, but they are typically not internationally shared.

In an observational science the strength of the outcomes depends on a consilience of evidence. Thus having evidence on inhomogeneities from both statistical homogenisation and physical studies strengthens the science.

This chapter will discuss the needs for future research on homogenisation grouped in five kinds of problems. In the first section we will discuss research on improving our physical understanding and physics-based corrections. The next section is about break detection, especially about two fundamental problems in statistical homogenisation: the inhomogeneous-reference problem and the multiple-breakpoint problem.

Next write about computing uncertainties in trends and long-term variability estimates from homogenised data due to remaining inhomogeneities. It may be possible to improve correction methods by treating it as a statistical model selection problem. The last section discusses whether inhomogeneities are stochastic or deterministic and how that may affect homogenisation and especially correction methods for the variability around the long-term mean.

For all the research ideas mentioned below, it is understood that in future we should study more meteorological variables than temperature. In addition, more studies on inhomogeneities across variables could be helpful to understand the causes of inhomogeneities and increase the signal to noise ratio. Homogenisation by national offices has advantages because here all climate elements from one station are stored together. This helps in understanding and identifying breaks. It would help homogenisation science and climate analysis to have a global database for all climate elements, like iCOADS for marine data. A Copernicus project has started working on this for land station data, which is an encouraging development.

Physical understanding

It is a good scientific practice to perform parallel measurements in order to manage unavoidable changes and to compare the results of statistical homogenisation to the expectations given the cause of the inhomogeneity according to the metadata. This information should also be analysed on continental and global scales to get a better understanding of when historical transitions took place and to guide homogenisation of large-scale (global) datasets. This requires more international sharing of parallel data and standards on the reporting of the size of breaks confirmed by metadata.

The Dutch weather service KNMI published a protocol how to manage possible future changes of the network, who decides what needs to be done in which situation, what kind of studies should be made, where the studies should be published and that the parallel data should be stored in their central database as experimental data. A translation of this report will soon be published by the WMO (Brandsma et al., 2019) and will hopefully inspire other weather services to formalise their network change management.

Next to statistical homogenisation, making and studying parallel measurements, and other physical estimates, can provide a second line of evidence on the magnitude of inhomogeneities. Having multiple lines of evidence provides robustness to observational sciences. Parallel data is especially important for the large historical transitions that are most likely to produce biases in network-wide to global climate datasets. It can validate the results of statistical homogenisation and be used to estimate possibly needed additional adjustments. The Parallel Observations Science Team of the International Surface Temperature Initiative (ISTI-POST) is working on building such a global dataset with parallel measurements.

Parallel data is especially suited to improve our physical understand of the causes of inhomogeneities by studying how the magnitude of the inhomogeneity depends on the weather and on instrumental design characteristics. This understanding is important for more accurate corrections of the distribution, for realistic benchmarking datasets to test our homogenisation methods and to determine which additional parallel experiments are especially useful.

Detailed physical models of the measurement, for example, the flow through the screens, radiative transfer and heat flows, can also help gain a better understanding of the measurement and its error sources. This aids in understanding historical instruments and in designing better future instruments. Physical models will also be paramount for understanding the impact of the surrounding on the measurement — nearby obstacles and surfaces influencing error sources and air flow — to changes in the measurand, such as urbanisation/deforestation or the introduction of irrigation. Land-use changes, especially urbanisation, should be studied together with relocations they may provoke.

Break detection

Longer climate series typically contain more than one break. This so-called multiple-breakpoint problem is currently an important research topic. A complication of relative homogenisation is that also the reference stations can have inhomogeneities. This so-called inhomogeneous-reference problem is not optimally solved yet. It is also not clear what temporal resolution is best for detection and what the optimal way is to handle the seasonal cycle in the statistical properties of climate data and of many inhomogeneities.

For temperature time series about one break per 15 to 20 years is typical and multiple breaks are thus common. Unfortunately, most statistical detection methods have been developed for one break and for the null hypothesis of white (sometimes red) noise. In case of multiple breaks the statistical test should not only take the noise variance into account, but also the break variance from breaks at other positions. For low signal to noise ratios, the additional break variance can lead to spurious detections and inaccuracies in the break position (Lindau and Venema, 2018a).

To apply single-breakpoint tests on series with multiple breaks, one ad-hoc solution is to first split the series at the most significant break (for example, the standard normalised homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution. Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably. This method is not used much nowadays.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. This can be done in a hypothesis testing (MASH) or in a statistical model selection framework. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011b). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.

This penalty function of PRODIGE was found to be suboptimal (Lindau and Venema, 2013). It was found that the penalty should be a function of the number of breaks, not fixed per break and that the relation with the length of the series should be reversed. It is not clear yet how sensitive homogenisation methods respond to this, but increasing the penalty per break in case of low SNR to reduce the number of breaks does not make the estimated break signal more accurate (Lindau and Venema, 2018a).

Not only the candidate station, also the reference stations will have inhomogeneities, which complicates homogenisation. Such inhomogeneities can be climatologically especially important when they are due to network-wide technological transitions. An example of such a transition is the current replacement of temperature observations using Stevenson screens by automatic weather stations. Such transitions are important periods as they may cause biases in the network and global average trends and they produce many breaks over a short period.

A related problem is that sometimes all stations in a network have a break at the same date, for example, when a weather service changes the time of observation. Nationally such breaks are corrected using metadata. If this change is unknown in global datasets one can still detect and correct such inhomogeneities statistically by comparison with other nearby networks. That would require an algorithm that additionally knows which stations belong to which network and prioritizes correcting breaks found between stations in different networks. Such algorithms do not exist yet and information on which station belongs to which network for which period is typically not internationally shared.

The influence of inhomogeneities in the reference can be reduced by computing composite references over many stations, removing reference stations with breaks and by performing homogenisation iteratively.

A direct approach to solving this problem would be to simultaneously homogenise multiple stations, also called joint detection. A step in this direction are pairwise homogenisation methods where breaks are detected in the pairs. This requires an additional attribution step, which attributes the breaks to a specific station. Currently this is done by hand (for PRODIGE; Caussinus and Mestre, 2004; Rustemeier et al., 2017) or with ad-hoc rules (by the Pairwise homogenisation algorithm of NOAA; Menne and Williams, 2009).

In the homogenisation method HOMER (Mestre et al., 2013) a first attempt is made to homogenise all pairs simultaneously using a joint detection method from bio-statistics. Feedback from first users suggests that this method should not be used automatically. It should be studied how good this methods works and where the problems come from.

Multiple breakpoint methods are more accurate as single breakpoint methods. This expected higher accuracy is founded on theory (Hawkins, 1972). In addition, in the HOME benchmarking study it was numerically found that modern homogenisation methods, which take the multiple breakpoint and the inhomogeneous reference problems into account, are about a factor two more accurate as traditional methods (Venema et al., 2012).

However, the current version of CLIMATOL applies single-breakpoint detection tests, first SNHT detection on a window then splitting, to achieve results comparable to modern multiple-breakpoint methods with respect to break detection and homogeneity of the data (Killick, 2016). This suggests that the multiple-breakpoint detection principle may not be as important as previously thought and warrants deeper study or the accuracy of CLIMATOL is partly due to an unknown unknown.

The signal to noise ratio is paramount for the reliable detection of breaks. It would thus be valuable to develop statistical methods that explain part of the variance of a difference time series and remove this to see breaks more clearly. Data from (regional) reanalysis could be useful predictors for this.

First methods have been published to detect breaks for daily data (Toreti et al., 2012; Rienzner and Gandolfi, 2013). It has not been studied yet what the optimal resolution for breaks detection is (daily, monthly, annual), nor what the optimal way is to handle the seasonal cycle in the climate data and exploit the seasonal cycle of inhomogeneities. In the daily temperature benchmarking study of Killick (2016) most non-specialised detection methods performed better than the daily detection method MAC-D (Rienzner and Gandolfi, 2013).

The selection of appropriate reference stations is a necessary step for accurate detection and correction. Many different methods and metrics are used for the station selection, but studies on the optimal method are missing. The knowledge of local climatologists which stations have a similar regional climate needs to be made objective so that it can be applied automatically (at larger scales).

For detection a high signal to noise ratio is most important, while for correction it is paramount that all stations are in the same climatic region. Typically the same networks are used for both detection and correction, but it should be investigated whether a smaller network for correction would be beneficial. Also in general, we need more research on understanding the performance of (monthly and daily) correction methods.

Computing uncertainties

  • Also after homogenisation uncertainties remain in the data due to various problems: Not all breaks in the candidate station have been and can be detected.

  • False alarms are an unavoidable trade-off for detecting many real breaks.

  • Uncertainty in the estimation of correction parameters due to limited data.

  • Uncertainties in the corrections due to limited information on the break positions.

From validation and benchmarking studies we have a reasonable idea about the remaining uncertainties that one can expect in the homogenised data, at least with respect to changes in the long-term mean temperature. For many other variables and changes in the distribution of (sub-)daily temperature data individual developers have validated their methods, but systematic validation and comparison studies are still missing.

Furthermore, such studies only provide a general uncertainty level, whereas more detailed information for every single station/region and period would be valuable. The uncertainties will strongly depend on the signal to noise ratios, on the statistical properties of the inhomogeneities of the raw data and on the quality and cross-correlations of the reference stations. All of which vary strongly per station, region and period.

Communicating such a complicated errors structure, which is mainly temporal, but also partially spatial, is a problem in itself. Furthermore, not only the uncertainty in the means should be considered, but, especially for daily data, uncertainties in the complete probability density function need to be estimated and communicated. This could be communicated with an ensemble of possible realisations, similar to Brohan et al. (2006).

An analytic understanding of the uncertainties is important, but is often limited to idealised cases. Thus also numerical validation studies, such as the past HOME and upcoming ISTI studies are important for an assessment of homogenisation algorithms under realistic conditions.

Creating validation datasets also help to see the limits of our understanding of the statistical properties of the break signal. This is especially the case for variables other than temperature and for daily and (sub-)daily data. Information is needed on the real break frequencies and size distributions, but also their auto-correlations and cross-correlations, as well as explained in the next section the stochastic nature of breaks in the variability around the mean.

Validation studies focussed on difficult cases would be valuable for a better understanding. For example, sparse networks, isolated island networks, large spatial trend gradients and strong decadal variability in the difference series of nearby stations (for example, due to El Nino in complex mountainous regions).

The advantage of simulated data is that it can create a large number of quite realistic complete networks. For daily data it will remain hard for the years to come to determine how to generate a realistic validation dataset. Thus even if using parallel measurements is mostly limited to one break per test, it does provide the highest degree of realism for this one break.

Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenisation. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

The physics of the problem suggests that many inhomogeneities are caused by stochastic processes. An example affecting many instruments are differences in the response time of instruments, which can lead to differences determined by turbulence. A fast thermometer will on average read higher maximum temperatures than a slow one, but this difference will be variable and sometimes be much higher than the average. In case of errors due to insolation the radiation error will be modulated by clouds. An insufficiently shielded thermometer will need larger corrections on warm days, which will typically be more sunny, but some warm days will be cloudy and not need much correction, while other warm days are sunny and calm and have a dry hot surface. The adjustment of daily data for studies on changes in the variability is thus a distribution problem and not only a regression bias-correction problem. For data assimilation (numerical weather prediction) accurate bias correction (with regression methods) is probably the main concern.

Seen as a variability problem, the correction of daily data is similar to statistical downscaling in many ways. Both methodologies aim to produce bias-corrected data with the right variability, taking into account the local climate and large-scale circulation. One lesson from statistical downscaling is that increasing the variance of a time series deterministically by multiplication with a fraction, called inflation, is the wrong approach and that the variance that could not be explained by regression using predictors should be added stochastically as noise instead (Von Storch, 1999). Maraun (2013) demonstrated that the inflation problem also exists for the deterministic Quantile Matching method, which is also used in daily homogenisation. Current statistical correction methods deterministically change the daily temperature distribution and do not stochastically add noise.

Transferring ideas from downscaling to daily homogenisation is likely fruitful to develop such stochastic variability correction methods. For example, predictor selection methods from downscaling could be useful. Both fields require powerful and robust (time invariant) predictors. Multi-site statistical downscaling techniques aim at reproducing the auto- and cross-correlations between stations (Maraun et al., 2010), which may be interesting for homogenisation as well.

The daily temperature benchmarking study of Rachel Killick (2016) suggests that current daily correction methods are not able to improve the distribution much. There is a pressing need for more research on this topic. However, these methods likely also performed less well because they were used together with detection methods with a much lower hit rate than the comparison methods.

The deterministic correction methods may not lead to severe errors in homogenisation, that should still be studied, but stochastic methods that implement the corrections by adding noise would at least theoretically fit better to the problem. Such stochastic corrections are not trivial and should have the right variability on all temporal and spatial scales.

It should be studied whether it may be better to only detect the dates of break inhomogeneities and perform the analysis on the homogeneous subperiods (removing the need for corrections). The disadvantage of this approach is that most of the trend variance is in the difference in the mean of the HSPs and only a small part is in the trend within the HPSs. In case of trend analysis, this would be similar to the work of the Berkeley Earth Surface Temperature group on the mean temperature signal. Periods with gradual inhomogeneities, e.g., due to urbanisation, would have to be detected and excluded from such an analysis.

An outstanding problem is that current variability correction methods have only been developed for break inhomogeneities, methods for gradual ones are still missing. In homogenisation of the mean of annual and monthly data, gradual inhomogeneities are successfully removed by implementing multiple small breaks in the same direction. However, as daily data is used to study changes in the distribution, this may not be appropriate for daily data as it could produce larger deviations near the breaks. Furthermore, changing the variance in data with a trend can be problematic (Von Storch, 1999).

At the moment most daily correction methods correct the breaks one after another. In monthly homogenisation it is found that correcting all breaks simultaneously (Caussinus and Mestre, 2004) is more accurate (Domonkos et al., 2013). It is thus likely worthwhile to develop multiple breakpoint correction methods for daily data as well.

Finally, current daily correction methods rely on previously detected breaks and assume that the homogeneous subperiods (HSP) are homogeneous (i.e., each segment between breakpoints assume to be homogeneous) . However, these HSP are currently based on detection of breaks in the mean only. Breaks in higher moments may thus still be present in the "homogeneous" sub periods and affect the corrections. If only for this reason, we should also work on detection of breaks in the distribution.

Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 40 for decile corrections applied to every season, to a large number of DOF for quantile or percentile matching.

A study using PRODIGE on the HOME benchmark suggested that for typical European networks monthly adjustment are best for temperature; annual corrections are probably less accurate because they fail to account for changes in seasonal cycle due to inhomogeneities. For precipitation annual corrections were most accurate; monthly corrections were likely less accurate because the data was too noisy to estimate the 12 correction constants/degrees of freedom.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean could be sufficient, for a serious exposure problem (e.g., insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted. The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighbouring reference stations are.

An entire regional network is typically homogenised with the same correction method, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station, from break to break and from period to period. Work on correction methods that objectively select the optimal correction method, e.g., using an information criterion, would be valuable.

In case of (sub-)daily data, the options to select from become even larger. Daily data can be corrected just for inhomogeneities in the mean (e.g., Vincent et al., 2002, where daily temperatures are corrected by incorporating a linear interpolation scheme that preserves the previously defined monthly corrections) or also for the variability around the mean. In between are methods that adjust for the distribution including the seasonal cycle, which dominates the variability and is thus effectively similar to mean adjustments with a seasonal cycle. Correction methods of intermediate complexity with more than one, but less than 10 degrees of freedom would fill a gap and allow for more flexibility in selecting the optimal correction model.

When applying these methods (Della-Marta and Wanner, 2006; Wang et al., 2010; Mestre et al., 2011; Trewin, 2013) the number of quantile bins (categories) needs to be selected as well as whether to use physical weather-dependent predictors and the functional form they are used (Auchmann and Brönnimann, 2012). Objective optimal methods for these selections would be valuable.

Related information

WMO Guidelines on Homogenization (English, French, Spanish) 

WMO guidance report: Challenges in the Transition from Conventional to Automatic Meteorological Observing Networks for Long-term Climate Records


Sunday, 4 October 2015

Measuring extreme temperatures in Uccle, Belgium


Open thermometer shelter with a single set of louvres.

That changes in the measurement conditions can lead to changes in the mean temperature is hopefully known by most people interested in climate change by now. That such changes are likely even more important when it comes to weather variability and extremes is unfortunately less known. The topic is studied much too little given its importance for the study of climatic changes in extremes, which are expected to be responsible for a large part of the impacts from climate change.

Thus I was enthusiastic when a Dutch colleague send me a news article on the topic from the homepage of the Belgium weather service, Koninklijk Meteorologisch Instituut (KMI). It describes a comparison of two different measurement set-ups, old and new, made side by side in [[Uccle]], the main office of the KMI. The main difference is the screen used to protect the thermometer from the sun. In the past these were often more open, that makes ventilation better, nowadays they are more closed to reduce (solar and infra red) radiation errors.

The more closed screen is a [[Stevenson screen]], invented in the last decades of the 19th century. I had assumed that most countries had switched to Stevenson screens before the 1920s. But I recently learned that Switzerland changed in the 1960s and in Uccle they changed in 1983. Making any change to the measurements is a difficult trade off between improving the system and breaking the homogeneity of the climate record. It would be great to have a historical overview of such historical transitions in the way climate is measured for all countries.

I am grateful to the KMI for their permission to republish the story here. The translation, clarifications between square brackets and the related reading section are mine.



Closed thermometer screen with a double-louvred walls [Stevenson screen].
In the [Belgian] media one reads regularly that the highest temperature in Belgium is 38.8°C and that it was recorded in Uccle on June 27, 1947. Sometimes, one also mentions that the measurement was conducted in an "open" thermometer screen. On warm days the question typically arises whether this record could be broken. In order to be able to respond to this, it is necessary to take some facts into account that we will summarize below.

It is important to know that temperature measurements are affected by various factors, the most important one is the type of the thermometer screen in which the observations are carried out. One wants to measure the air temperature and therefore prevent a warming of the measuring equipment by protecting the instruments from the distorting effects of solar radiation. The type of thermometer screen is particularly important on sunny days and this is reflected in the observations.

Since 1983, the reference measurements of the weather station Uccle are made in a completely "closed" thermometer screen [a Stevenson screen] with double-louvred walls. Until May 2006, the reference thermometers were mercury thermometers for daily maximums and alcohol thermometers for daily minimums. [A typical combination nowadays because mercury freezes at -38.8°C.] Since June 2006, the temperature measurements are carried out continuously by means of an automatic sensor in the same type of closed cabin.

Before 1983, the measurements were carried out in an "open" thermometer screen with only a single set of louvres, which on top of that offered no protection on the north side. Because of the reasons mentioned above, the maximum temperature in this type of shelter were too high, especially during the summer period with intense sunshine. On July 19, 2006, one of the hottest days in Uccle, for example, the reference [Stevenson] screen measured a maximum temperature of 36.2°C compared to 38.2°C in the "open" shelter on the same day.

As the air temperature measurements in the closed screen are more relevant, it is advisable to study the temperature records that would be or have been measured in this type of reference screen. Recently we have therefore adjusted the temperature measurements of the open shelter from before 1983, to make them comparable with the values from the closed screen. These adjustments were derived from the comparison between the simultaneous [parallel] observations measured in the two types of screens during a period of 20 years (1986-2005). Today we therefore have two long series of daily temperature extremes (minimum and maximum), beginning in 1901, corresponding to measurements from a closed screen.

When one uses the alignment method described above, the estimated value of the maximum temperature in a closed screen on June 27, 1947, is 36.6°C (while a maximum value of 38.8°C was measured in an open screen, as mentioned in the introduction). This value of 36.6°C should therefore be recognized as the record value for Uccle, in accordance with the current measurement procedures. [For comparison, David Parker (1994) estimated that the cooling from the introduction of Stevenson screens was less than 0.2°C in the annual means in North-West Europe.]

For the specialists, we note that the daily maximum temperature shown in the synoptic reports of Uccle, usually are up to a few tenths of a degree higher compared with the reference climatological observations that were mentioned previously. This difference can be explained by the time intervals over which the temperature is averaged in order to reduce the influence of atmospheric turbulence. The climatic extremes are calculated over a period of ten minutes, while the synoptic extremes are calculated from values ​​that were averaged over a time span of a minute. In the future, will make these calculation methods the same by applying the climatic procedures always.

Related reading

KMI: Het meten van de extreme temperaturen te Ukkel

To study the influence of such transitions in the way the climate is measured using parallel data we have started the Parallel Observations Science Team (ISTI-POST). One of the POST studies is on the transition to Stevenson screens, which is headed by Theo Brandsma. If you have such data please contact us. If you know someone who might, please tell them about POST.

Another parallel measurement showing huge changes in the extremes is discussed in my post: Be careful with the new daily temperature dataset from Berkeley

More on POST: A database with daily climate data for more reliable studies of changes in extreme weather

Introduction to series on weather variability and extreme events

On the importance of changes in weather variability for changes in extremes

A research program on daily data: HUME: Homogenisation, Uncertainty Measures and Extreme weather

Reference

Parker, David E., 1994: Effect of changing exposure of thermometers at land stations. International journal of climatology, 14, pp. 1-31, doi: 10.1002/joc.3370140102.

Wednesday, 5 March 2014

Be careful with the new daily temperature dataset from Berkeley

The Berkeley Earth Surface Temperature project now also provides daily temperature data. On the one hand this is an important improvement, that we now have a global dataset with homogenized daily data. On the other hand, there was a reason that climatologists did not publish a global daily dataset yet. Homogenization of daily data is difficult and the data provided by Berkeley is likely better than analyzing raw data, but still insufficient for robust conclusions about changes in extreme weather and weather variability.

The new dataset is introuduced by Zeke Hausfather and Robert Rohde on Real Climate:
Daily temperature data is an important tool to help measure changes in extremes like heat waves and cold spells. To date, only raw quality controlled (but not homogenized) daily temperature data has been available through GHCN-Daily and similar sources. Using this data is problematic when looking at long-term trends, as localized biases like station moves, time of observation changes, and instrument changes can introduce significant biases.

For example, if you were studying the history of extreme heat in Chicago, you would find a slew of days in the late 1930s and early 1940s where the station currently at the Chicago O’Hare airport reported daily max temperatures above 45 degrees C (113 F). It turns out that, prior to the airport’s construction, the station now associated with the airport was on the top of a black roofed building closer to the city. This is a common occurrence for stations in the U.S., where many stations were moved from city cores to newly constructed airports or wastewater treatment plants in the 1940s. Using the raw data without correcting for these sorts of bias would not be particularly helpful in understanding changes in extremes.

The post explains in more detail how the BEST daily method works and presents some beautiful visualizations and videos of the data. Worth reading in detail.

Daily homogenization

When I understand the homogenization procedure of BEST right, it is based on their methods for the monthly mean temperature and this only accounts for non-climatic changes (inhomogeneities) in the mean temperature.

The example of a move from black roof in a city to an airport is also a good example that not only the mean can change. The black roof will show more variability because on hot sunny days the warm bias is larger than on windy cloudy days. Thus part of this variability is variability in solar insolation and wind.

Also the urban heat island could be a source of variability, the UHI is strongest on wind and cloud free days. Thus part of the variability in observed temperature will be due to variability in wind and clouds.

A nice illustration of the problem can be found in a recent article by Blair Trewin. He compares the distribution of two stations, one in a city near the coast and one at an airport more inland. In the past the station was in the city, nowadays it is at the airport. The modern measurements in the city that are shown below have been made to study the influence of this change.

For this plot he computed the 0th to the 100th percentile. The 50th percentile is the median, 50% of the data has a lower value. The 10th percentile is the value where 10% of the data is smaller, and so on. The 0th and 100th percentile in this plot are the minimum and maximum. What is displayed is the temperature difference between these percentiles. On average the difference is about 2°C, the airport is warmer. However, for the higher percentiles (95th) the difference is much larger. Trewin explains this by cooling of the city station by a land-sea circulation (sea breeze) often seen on hot summer days. For the highest percentiles (99th), the difference becomes smaller again because offshore wind override the sea breeze.



Clearly if you would homogenize this time series for the transition from the coast to the inland by only correcting the mean, you would still have a large inhomogeneity in the higher percentiles, which would still lead to non-climatic spurious trends in hot weather.

Thus we would need a bias correction of the complete probability distribution and not just its mean.

Or we should homogenize the indices we are interested in, for example percentiles or the number of days above 40°C. etc. The BEST algorithm being fully automatic could be well suited for such an approach.

Monday, 2 December 2013

On the importance of changes in weather variability for changes in extremes

This is part 2 of the series on weather variability.

A more extreme climate is often interpreted in terms of weather variability. In the media weather variability and extreme weather are typically even used as synonyms. However, extremes may also change due to changes in the mean state of the atmosphere (Rhines and Huybers, 2013) and it is in general difficult to decipher the true cause.

Katz and Brown theorem

Changes in mean and variability are dislike quantities. Thus comparing them is like comparing apples and oranges. Still Katz and Brown (1992) found one interesting general result: the more extreme the event, the more important a change in the variability is relative to the mean (Figure 1). Thus if there is a change in variability, it is most important for the most extreme events. If the change is small, these extreme events may have to be extremely extreme.

Given this importance of variability they state:
"[Changes in the variability of climate] need to be addressed before impact assessments for greenhouse gas-induced climate change can be expected to gain much credibility."

The relative sensitivity of an extreme to changes in the mean (dashed line) and in the standard deviation (solid line) for a certain temperature threshold (x-axis). The relative sensitivity of the mean (standard deviation) is the change in probability of an extreme event to a change in the mean (or standard deviation) divided by its probability. From Katz and Brown (1992).
It is common in the climatological literature to also denote events that happen relatively regularly with the term extreme. For example, the 90 and 99 percentiles are often called extremes even if such exceedances will occur a few times a month or year. Following the common parlance, we will denote such distribution descriptions as moderate extremes, to distinguish them from extreme extremes. (Also the terms soft and hard extremes are used.) Based on the theory of Katz and Brown, the rest of this section will be ordered from moderate to extreme extremes.

Examples from scientific literature

We start with the variance, which is a direct measure of variability and strongly related to the bulk of the distribution. Della-Marta et al. (2007) studied trends in station data over the last century of the daily summer maximum temperature (DSMT). They found that the increase in DSMT variance over Western Europe and central Western Europe is, respectively, responsible for approximately 25% and 40% of the increase in hot days in these regions.

They also studied trends in the 90th, 95th and 98th percentiles. For these trends variability was found to be important: If only changes in the mean had been taken into account these estimates would have been between 14 and 60% lower.

Also in climate projections for Europe, variability is considered to be important. Fischer and Schär (2009) found in the PRUDENCE dataset (a European downscaling project) that for the coming century the strongest increases in the 95th percentile are in regions where variability increases most (France) and not in regions where the mean warming is largest (Iberian Peninsula).

The 2003 heat wave is a clear example of an extreme extreme, where one would thus expect that variability is important. Schär et al. (2004) indeed report that the 2003 heat wave is extremely unlikely given a change in the mean only. They show that a recent increase in variability would be able to explain the heat wave. An alternative explanation could also be that the temperature does not follow the normal distribution.

Monday, 25 November 2013

Introduction to series on weather variability and extreme events

This is the introduction to a series on changes in the daily weather and extreme weather. The series discusses how much we know about whether and to what extent the climate system experiences changes in the variability of the weather. Variability here denotes the the changes of the shape of probability distribution around the mean. The most basic variable to denote variability would be the variance, but many other measures could be used.

Dimensions of variability

Studying weather variability adds more dimensions to our apprehension of climate change and also complexities. This series is mainly aimed at other scientists, but I hope it will be clear enough for everyone interested. If not, just complain and I will try to explain it better. At least if that is possible, we do not have much solid results on changes in the weather variability yet.

The quantification of weather variability requires the specification of the length of periods and the size of regions considered (extent, the scope or domain of the data). Different from studying averages is that the consideration of variability adds the dimension of the spatial and temporal averaging scale (grain, the minimum spatial resolution of the data); thus variability requires the definition of an upper and lower scale. This is important in climate and weather as specific climatic mechanisms may influence variability at certain scale ranges. For instance, observations suggest that near-surface temperature variability is decreasing in the range between 1 year and decades, while its variability in the range of days to months is likely increasing.

Similar to extremes, which can be studied on a range from moderate (soft) extremes to extreme (hard) extremes, variability can be analysed by measures which range from describing the bulk of the probability distribution to ones that focus more on the tails. Considering the complete probability distribution adds another dimension to anthropogenic climate change. Such a soft measure of variability could be the variance, or the interquartile range. A harder measure of variability could be the kurtosis (4th moment) or the distance between the first and the 99th percentile. A hard variability measure would be the difference between the maximum and minimum 10-year return periods.

Another complexity to the problem is added by the data: climate models and observations typically have very different averaging scales. Thus any comparisons require upscaling (averaging) or downscaling, which in turn needs a thorough understanding of variability at all involved scales.

A final complexity is added by the need to distinguish between the variability of the weather and the variability added due to measurement and modelling uncertainties, sampling and errors. This can even affect trend estimates of the observed weather variability because improvements in climate observations have likely caused apparent, but non-climatic, reductions in the weather variability. As a consequence, data homogenization is central in the analysis of observed changes in weather variability.

Friday, 19 July 2013

Statistically interesting problems: correction methods in homogenization

This is the last post in a series on five statistically interesting problems in the homogenization of climate network data. This post will discuss two problems around the correction methods used in homogenization. Especially the correction of daily data is becoming an increasingly important problem because more and more climatologist work with daily climate data. The main added value of daily data is that you can study climatic changes in the probability distribution, which necessitates studying the non-climatic factors (inhomogeneities) as well. This is thus a pressing, but also a difficult task.

The five main statistical problems are:
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 4. Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 120 for decile corrections (for the higher order moment method (HOM) for daily data, Della-Marta & Wanner, 2006) applied to every month, to a large number of DOF for quantile or percentile matching.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean would be sufficient, for a serious exposure problem (e.g. insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted.

The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighboring reference stations are.

Currently climatologists choose their correction method mainly subjectively. For precipitation annual correction are typically applied and for temperature monthly correction are typical. The HOME benchmarking study showed these are good choices. For example, an experimental contribution correcting precipitation on a monthly scale had a larger error as the same method applied on the annual scale because the data did not allow for an accurate estimation of 12 monthly correction constants.

One correction method is typically applied to the entire regional network, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station and from break to break. Especially in global studies, the number of stations in a region and thus the signal to noise ratio varies widely and one fixed choice is likely suboptimal. Studying which correction method is optimal for every break is much work for manual methods, instead we should work on automatic correction methods that objectively select the optimal correction method, e.g., using an information criterion. As far as I know, no one works on this yet.

Problem 5. Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenization. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

Sunday, 5 May 2013

The age of Climategate is almost over

It seems as if the age of Climategate is over (soon). Below you can see the number of Alexa (social bookmarking) users that visited What Up With That? At the end of 2009 you see a jump upwards. That is where Anthony Watts made his claim to fame by violating the privacy of climate scientist Phil Jones of the Climate Research Unit (CRU) and some of his colleagues.

Criminals broke into the CRU backup servers and stole and published their email correspondence. What was Phil Jones' crime? The reason why manners and constitutional rights are not important? The reason to damage his professional network? He is a climate scientist!

According to Watts and co the emails showed deliberate deception. However, there have been several investigations into Climategate, none of which found evidence of fraud or scientific misconduct. It would thus be appropriate to rename the Climategate to Scepticgate. And it is a good sign that this post-normal age is (almost) over and the number of visitors to WUWT is going back to the level before Climategate.

Since the beginning of 2012, the number of readers of WUWT is in a steady decline. It is interesting coincidence that I started commenting once in a while since February 2012. Unfortunately for the narcissistic part of my personality: correlation is not causation.

The peak in mid 2012 is Anthony Watts first failed attempt in writing a scientific study.

According to WUWT Year in review (Wordpress statistics), WUWT was viewed about 31,000,000 times in 2011 and 36,000,000 times in 2012. However, a large part of the visitors of my blog are robots and that problem is worse here as for my little read German language blog. Alexa more likely only counts real visitors.


Friday, 29 March 2013

Special issue on homogenisation of climate series

The open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás" has just published a special issue on homogenization of climate records. This special issue contains eight research papers. It is an offspring of the COST Action HOME: Advances in homogenization methods of climate series: an integrated approach (COST-ES0601).

To be able to discuss eight papers, this post does not contain as much background information as usual and is aimed at people already knowledgeable about homogenization of climate networks.

Contents

Mónika Lakatos and Tamás Szentimrey: Editorial.
The editorial explains the background of this special issue: the importance of homogenisation and the COST Action HOME. Mónika and Tamás thank you very much for your efforts to organise this special issue. I think every reader will agree that it has become a valuable journal issue.

Monthly data

Ralf Lindau and Victor Venema: On the multiple breakpoint problem and the number of significant breaks in homogenization of climate records.
My article with Ralf Lindau is already discussed in a previous post on the multiple breakpoint problem.
José A. Guijarro: Climatological series shift test comparison on running windows.
Longer time series typically contain more than one inhomogeneity, but statistical tests are mostly designed to detect one break. One way to resolve this conflict is by applying these tests on short moving windows. José compares six statistical detection methods (t-test, Standard Normal Homogeneity Test (SNHT), two-phase regression (TPR), Wilcoxon-Mann-Whithney test, Durbin-Watson test and SRMD: squared relative mean difference), which are applied on running windows with a length between 1 and 5 years (12 to 60 values (months) on either side of the potential break). The smart trick of the article is that all methods are calibrated to a false alarm rate of 1% for better comparison. In this way, he can show that the t-test, SNHT and SRMD are best for this problem and almost identical. To get good detection rates, the window needs to be at least 2*3 years. As this harbours the risk of having two breaks in one window, José has decided to change his homogenization method CLIMATOL to using the semi-hierarchical scheme of SNHT instead of using windows. The methods are tested on data with just one break; it would have been interesting to also simulate the more realistic case with multiple independent breaks.
Olivier Mestre, Peter Domonkos, Franck Picard, Ingeborg Auer, Stéphane Robin, Emilie Lebarbier, Reinhard Böhm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klan-car, Brigitte Dubuisson, and Petr Stepanek: HOMER: a homogenization software – methods and applications.
HOMER is a new homogenization method and is developed using the best methods tested on the HOME benchmark. Thus theoretically, this should be the best method currently available. Still, sometimes interactions between parts of an algorithm can lead to unexpected results. It would be great if someone would test HOMER on the HOME benchmark dataset, so that we can compare its performance with the other algorithms.

Sunday, 24 March 2013

New article on the multiple breakpoint problem in homogenization

An interesting paper by Ralf Lindau and me on the multiple breakpoint problem has just appeared in a Special issue on homogenization of the open access Quarterly Journal of the Hungarian Meteorological Service "Időjárás".

Multiple break point problem

Long instrumental time series contain non-climatological changes, called inhomogeneities. For example because of relocations or due to changes in the instrumentation. To study real changes in the climate more accurately these inhomogeneities need to be detected and removed in a data processing step called homogenization; also called segmentation in statistics.

Statisticians have worked a lot on the detection of a single break point in data. However, unfortunately, long climate time series typically contain more than just one break point. There are two ad hoc methods to deal with this.

The most used method is the hierarchical one: to first detect the largest break and then to redo the detection on the two subsections, and so on until no more breaks are found or the segments become too short. A variant is the semi-hierachical method in which old detected breaks are retested and removed if no longer significant. For example, SNHT uses a semi-hierachical scheme and thus also the pairwise homogenization algorithm of NOAA, which uses SNHT for detection.

The second ad hoc method is to detect the breaks on a moving window. This window should be long enough for sensitivity, but should not be too long because that increases the chance of two breaks in the window. In the Special issue there is an article by José A. Guijarro on this method, which is used for his homogenization method CLIMATOL.

While these two ad hoc methods work reasonably, detecting all breaks simultaneously is more powerful. This can be performed as an exhaustive search of all possible combinations (used by the homogenization method MASH). With on average one break per 15 to 20 years, the number of breaks and thus combinations can get very large. Modern homogenization methods consequently use an optimization method called dynamic programming (used by the homogenization methods PRODIGE, ACMANT and HOMER).

All the mentioned homogenization methods have been compared with each other on a realistic benchmark dataset by the COST Action HOME. In the corresponding article (Venema et al., 2012) you can find references to all the mentioned methods. The results of this benchmarking showed that multiple breakpoint methods were clearly the best. However, this is not only because of the elegant solution to the multiple breakpoint problem, these methods also had other advantages.

Friday, 17 February 2012

HUME: Homogenisation, Uncertainty Measures and Extreme weather

Proposal for future research in homogenisation

To keep this post short, a background in homogenisation is assumed and not every argument is fully rigorous.

Aim

This document wants to start a discussion on the research priorities in homogenisation of historical climate data from surface networks. It will argue that with the increased scientific work on changes in extreme weather, the homogenisation community should work more on daily data and especially on quantifying the uncertainties remaining in homogenized data. Comments on these ideas are welcome as well as further thoughts. Hopefully we can reach a consensus on research priorities for the coming years. A common voice will strengthen our voice with research funding agencies.

State-of-the-art

From homogenisation of monthly and yearly data, we have learned that the size of breaks is typically on the order of the climatic changes observed in the 20th century and that period between two detected breaks is around 15 to 20 years. Thus these inhomogeneities are a significant source of error and need to be removed. The benchmark of the Cost Action HOME has shown that these breaks can be removed reliably, that homogenisation improves the usefulness of the temperature and precipitation data to study decadal variability and secular trends. Not all problems are already optimally solved, for instance the solutions for the inhomogeneous reference problem are still quite ad hoc. The HOME benchmark found mixed results for precipitation and the handling of missing data can probably be improved. Furthermore, homogenisation of other climate elements and from different, for example dry, regions should be studied. However, in general, annual and monthly homogenisation can be seen as a mature field. The homogenisation of daily data is still in its infancy. Daily datasets are essential for studying extremes of weather and climate. Here the focus is not on the mean values, but on what happens in the tails of the distributions. Looking at the physical causes of inhomogeneities, one would expect that many of them especially affect the tails of the distributions. Likewise the IPCC AR4 report warns that changes in extremes are often more sensitive to inhomogeneous climate monitoring practices than changes in the mean.