multiple breakpoints

Showing posts with label multiple breakpoints. Show all posts

Monday, October 12, 2020

The deleted chapter of the WMO Guidance on the homogenisation of climate station data

The Task Team on Homogenization (TT-HOM) of the Open Panel of CCl Experts on Climate Monitoring and Assessment (OPACE-2) of the Commission on Climatology (CCl) of the World Meteorological Organization (WMO) has published their Guidance on the homogenisation of climate station data.

The guidance report was a bit longish, so at the end we decided that the last chapter on "Future research & collaboration needs" was best deleted. As chair of the task team and as someone who likes tp dream about what others could do in a comfy chair, I wrote most of this chapter and thus we decided to simply make it a blog post for this blog. Enjoy.

Introduction

This guidance is based on our current best understanding of inhomogeneities and homogenisation. However, writing it also makes clear there is a need for a better understanding of the problems.

A better mathematical understanding of statistical homogenisation is important because that is what most of our work is based on. A stronger mathematical basis is a prerequisite for future methodological improvements.

A stronger focus on a (physical) understanding of inhomogeneities would complement and strengthen the statistical work. This kind of work is often performed at the station or network level, but also needed at larger spatial scales. Much of this work is performed using parallel measurements, but they are typically not internationally shared.

In an observational science the strength of the outcomes depends on a consilience of evidence. Thus having evidence on inhomogeneities from both statistical homogenisation and physical studies strengthens the science.

This chapter will discuss the needs for future research on homogenisation grouped in five kinds of problems. In the first section we will discuss research on improving our physical understanding and physics-based corrections. The next section is about break detection, especially about two fundamental problems in statistical homogenisation: the inhomogeneous-reference problem and the multiple-breakpoint problem.

Next write about computing uncertainties in trends and long-term variability estimates from homogenised data due to remaining inhomogeneities. It may be possible to improve correction methods by treating it as a statistical model selection problem. The last section discusses whether inhomogeneities are stochastic or deterministic and how that may affect homogenisation and especially correction methods for the variability around the long-term mean.

For all the research ideas mentioned below, it is understood that in future we should study more meteorological variables than temperature. In addition, more studies on inhomogeneities across variables could be helpful to understand the causes of inhomogeneities and increase the signal to noise ratio. Homogenisation by national offices has advantages because here all climate elements from one station are stored together. This helps in understanding and identifying breaks. It would help homogenisation science and climate analysis to have a global database for all climate elements, like iCOADS for marine data. A Copernicus project has started working on this for land station data, which is an encouraging development.

Physical understanding

It is a good scientific practice to perform parallel measurements in order to manage unavoidable changes and to compare the results of statistical homogenisation to the expectations given the cause of the inhomogeneity according to the metadata. This information should also be analysed on continental and global scales to get a better understanding of when historical transitions took place and to guide homogenisation of large-scale (global) datasets. This requires more international sharing of parallel data and standards on the reporting of the size of breaks confirmed by metadata.

The Dutch weather service KNMI published a protocol how to manage possible future changes of the network, who decides what needs to be done in which situation, what kind of studies should be made, where the studies should be published and that the parallel data should be stored in their central database as experimental data. A translation of this report will soon be published by the WMO (Brandsma et al., 2019) and will hopefully inspire other weather services to formalise their network change management.

Next to statistical homogenisation, making and studying parallel measurements, and other physical estimates, can provide a second line of evidence on the magnitude of inhomogeneities. Having multiple lines of evidence provides robustness to observational sciences. Parallel data is especially important for the large historical transitions that are most likely to produce biases in network-wide to global climate datasets. It can validate the results of statistical homogenisation and be used to estimate possibly needed additional adjustments. The Parallel Observations Science Team of the International Surface Temperature Initiative (ISTI-POST) is working on building such a global dataset with parallel measurements.

Parallel data is especially suited to improve our physical understand of the causes of inhomogeneities by studying how the magnitude of the inhomogeneity depends on the weather and on instrumental design characteristics. This understanding is important for more accurate corrections of the distribution, for realistic benchmarking datasets to test our homogenisation methods and to determine which additional parallel experiments are especially useful.

Detailed physical models of the measurement, for example, the flow through the screens, radiative transfer and heat flows, can also help gain a better understanding of the measurement and its error sources. This aids in understanding historical instruments and in designing better future instruments. Physical models will also be paramount for understanding the impact of the surrounding on the measurement — nearby obstacles and surfaces influencing error sources and air flow — to changes in the measurand, such as urbanisation/deforestation or the introduction of irrigation. Land-use changes, especially urbanisation, should be studied together with relocations they may provoke.

Break detection

Longer climate series typically contain more than one break. This so-called multiple-breakpoint problem is currently an important research topic. A complication of relative homogenisation is that also the reference stations can have inhomogeneities. This so-called inhomogeneous-reference problem is not optimally solved yet. It is also not clear what temporal resolution is best for detection and what the optimal way is to handle the seasonal cycle in the statistical properties of climate data and of many inhomogeneities.

For temperature time series about one break per 15 to 20 years is typical and multiple breaks are thus common. Unfortunately, most statistical detection methods have been developed for one break and for the null hypothesis of white (sometimes red) noise. In case of multiple breaks the statistical test should not only take the noise variance into account, but also the break variance from breaks at other positions. For low signal to noise ratios, the additional break variance can lead to spurious detections and inaccuracies in the break position (Lindau and Venema, 2018a).

To apply single-breakpoint tests on series with multiple breaks, one ad-hoc solution is to first split the series at the most significant break (for example, the standard normalised homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution. Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably. This method is not used much nowadays.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. This can be done in a hypothesis testing (MASH) or in a statistical model selection framework. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011b). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.

This penalty function of PRODIGE was found to be suboptimal (Lindau and Venema, 2013). It was found that the penalty should be a function of the number of breaks, not fixed per break and that the relation with the length of the series should be reversed. It is not clear yet how sensitive homogenisation methods respond to this, but increasing the penalty per break in case of low SNR to reduce the number of breaks does not make the estimated break signal more accurate (Lindau and Venema, 2018a).

Not only the candidate station, also the reference stations will have inhomogeneities, which complicates homogenisation. Such inhomogeneities can be climatologically especially important when they are due to network-wide technological transitions. An example of such a transition is the current replacement of temperature observations using Stevenson screens by automatic weather stations. Such transitions are important periods as they may cause biases in the network and global average trends and they produce many breaks over a short period.

A related problem is that sometimes all stations in a network have a break at the same date, for example, when a weather service changes the time of observation. Nationally such breaks are corrected using metadata. If this change is unknown in global datasets one can still detect and correct such inhomogeneities statistically by comparison with other nearby networks. That would require an algorithm that additionally knows which stations belong to which network and prioritizes correcting breaks found between stations in different networks. Such algorithms do not exist yet and information on which station belongs to which network for which period is typically not internationally shared.

The influence of inhomogeneities in the reference can be reduced by computing composite references over many stations, removing reference stations with breaks and by performing homogenisation iteratively.

A direct approach to solving this problem would be to simultaneously homogenise multiple stations, also called joint detection. A step in this direction are pairwise homogenisation methods where breaks are detected in the pairs. This requires an additional attribution step, which attributes the breaks to a specific station. Currently this is done by hand (for PRODIGE; Caussinus and Mestre, 2004; Rustemeier et al., 2017) or with ad-hoc rules (by the Pairwise homogenisation algorithm of NOAA; Menne and Williams, 2009).

In the homogenisation method HOMER (Mestre et al., 2013) a first attempt is made to homogenise all pairs simultaneously using a joint detection method from bio-statistics. Feedback from first users suggests that this method should not be used automatically. It should be studied how good this methods works and where the problems come from.

Multiple breakpoint methods are more accurate as single breakpoint methods. This expected higher accuracy is founded on theory (Hawkins, 1972). In addition, in the HOME benchmarking study it was numerically found that modern homogenisation methods, which take the multiple breakpoint and the inhomogeneous reference problems into account, are about a factor two more accurate as traditional methods (Venema et al., 2012).

However, the current version of CLIMATOL applies single-breakpoint detection tests, first SNHT detection on a window then splitting, to achieve results comparable to modern multiple-breakpoint methods with respect to break detection and homogeneity of the data (Killick, 2016). This suggests that the multiple-breakpoint detection principle may not be as important as previously thought and warrants deeper study or the accuracy of CLIMATOL is partly due to an unknown unknown.

The signal to noise ratio is paramount for the reliable detection of breaks. It would thus be valuable to develop statistical methods that explain part of the variance of a difference time series and remove this to see breaks more clearly. Data from (regional) reanalysis could be useful predictors for this.

First methods have been published to detect breaks for daily data (Toreti et al., 2012; Rienzner and Gandolfi, 2013). It has not been studied yet what the optimal resolution for breaks detection is (daily, monthly, annual), nor what the optimal way is to handle the seasonal cycle in the climate data and exploit the seasonal cycle of inhomogeneities. In the daily temperature benchmarking study of Killick (2016) most non-specialised detection methods performed better than the daily detection method MAC-D (Rienzner and Gandolfi, 2013).

The selection of appropriate reference stations is a necessary step for accurate detection and correction. Many different methods and metrics are used for the station selection, but studies on the optimal method are missing. The knowledge of local climatologists which stations have a similar regional climate needs to be made objective so that it can be applied automatically (at larger scales).

For detection a high signal to noise ratio is most important, while for correction it is paramount that all stations are in the same climatic region. Typically the same networks are used for both detection and correction, but it should be investigated whether a smaller network for correction would be beneficial. Also in general, we need more research on understanding the performance of (monthly and daily) correction methods.

Computing uncertainties

Also after homogenisation uncertainties remain in the data due to various problems: Not all breaks in the candidate station have been and can be detected.
False alarms are an unavoidable trade-off for detecting many real breaks.
Uncertainty in the estimation of correction parameters due to limited data.
Uncertainties in the corrections due to limited information on the break positions.

From validation and benchmarking studies we have a reasonable idea about the remaining uncertainties that one can expect in the homogenised data, at least with respect to changes in the long-term mean temperature. For many other variables and changes in the distribution of (sub-)daily temperature data individual developers have validated their methods, but systematic validation and comparison studies are still missing.

Furthermore, such studies only provide a general uncertainty level, whereas more detailed information for every single station/region and period would be valuable. The uncertainties will strongly depend on the signal to noise ratios, on the statistical properties of the inhomogeneities of the raw data and on the quality and cross-correlations of the reference stations. All of which vary strongly per station, region and period.

Communicating such a complicated errors structure, which is mainly temporal, but also partially spatial, is a problem in itself. Furthermore, not only the uncertainty in the means should be considered, but, especially for daily data, uncertainties in the complete probability density function need to be estimated and communicated. This could be communicated with an ensemble of possible realisations, similar to Brohan et al. (2006).

An analytic understanding of the uncertainties is important, but is often limited to idealised cases. Thus also numerical validation studies, such as the past HOME and upcoming ISTI studies are important for an assessment of homogenisation algorithms under realistic conditions.

Creating validation datasets also help to see the limits of our understanding of the statistical properties of the break signal. This is especially the case for variables other than temperature and for daily and (sub-)daily data. Information is needed on the real break frequencies and size distributions, but also their auto-correlations and cross-correlations, as well as explained in the next section the stochastic nature of breaks in the variability around the mean.

Validation studies focussed on difficult cases would be valuable for a better understanding. For example, sparse networks, isolated island networks, large spatial trend gradients and strong decadal variability in the difference series of nearby stations (for example, due to El Nino in complex mountainous regions).

The advantage of simulated data is that it can create a large number of quite realistic complete networks. For daily data it will remain hard for the years to come to determine how to generate a realistic validation dataset. Thus even if using parallel measurements is mostly limited to one break per test, it does provide the highest degree of realism for this one break.

Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenisation. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

The physics of the problem suggests that many inhomogeneities are caused by stochastic processes. An example affecting many instruments are differences in the response time of instruments, which can lead to differences determined by turbulence. A fast thermometer will on average read higher maximum temperatures than a slow one, but this difference will be variable and sometimes be much higher than the average. In case of errors due to insolation the radiation error will be modulated by clouds. An insufficiently shielded thermometer will need larger corrections on warm days, which will typically be more sunny, but some warm days will be cloudy and not need much correction, while other warm days are sunny and calm and have a dry hot surface. The adjustment of daily data for studies on changes in the variability is thus a distribution problem and not only a regression bias-correction problem. For data assimilation (numerical weather prediction) accurate bias correction (with regression methods) is probably the main concern.

Seen as a variability problem, the correction of daily data is similar to statistical downscaling in many ways. Both methodologies aim to produce bias-corrected data with the right variability, taking into account the local climate and large-scale circulation. One lesson from statistical downscaling is that increasing the variance of a time series deterministically by multiplication with a fraction, called inflation, is the wrong approach and that the variance that could not be explained by regression using predictors should be added stochastically as noise instead (Von Storch, 1999). Maraun (2013) demonstrated that the inflation problem also exists for the deterministic Quantile Matching method, which is also used in daily homogenisation. Current statistical correction methods deterministically change the daily temperature distribution and do not stochastically add noise.

Transferring ideas from downscaling to daily homogenisation is likely fruitful to develop such stochastic variability correction methods. For example, predictor selection methods from downscaling could be useful. Both fields require powerful and robust (time invariant) predictors. Multi-site statistical downscaling techniques aim at reproducing the auto- and cross-correlations between stations (Maraun et al., 2010), which may be interesting for homogenisation as well.

The daily temperature benchmarking study of Rachel Killick (2016) suggests that current daily correction methods are not able to improve the distribution much. There is a pressing need for more research on this topic. However, these methods likely also performed less well because they were used together with detection methods with a much lower hit rate than the comparison methods.

The deterministic correction methods may not lead to severe errors in homogenisation, that should still be studied, but stochastic methods that implement the corrections by adding noise would at least theoretically fit better to the problem. Such stochastic corrections are not trivial and should have the right variability on all temporal and spatial scales.

It should be studied whether it may be better to only detect the dates of break inhomogeneities and perform the analysis on the homogeneous subperiods (removing the need for corrections). The disadvantage of this approach is that most of the trend variance is in the difference in the mean of the HSPs and only a small part is in the trend within the HPSs. In case of trend analysis, this would be similar to the work of the Berkeley Earth Surface Temperature group on the mean temperature signal. Periods with gradual inhomogeneities, e.g., due to urbanisation, would have to be detected and excluded from such an analysis.

An outstanding problem is that current variability correction methods have only been developed for break inhomogeneities, methods for gradual ones are still missing. In homogenisation of the mean of annual and monthly data, gradual inhomogeneities are successfully removed by implementing multiple small breaks in the same direction. However, as daily data is used to study changes in the distribution, this may not be appropriate for daily data as it could produce larger deviations near the breaks. Furthermore, changing the variance in data with a trend can be problematic (Von Storch, 1999).

At the moment most daily correction methods correct the breaks one after another. In monthly homogenisation it is found that correcting all breaks simultaneously (Caussinus and Mestre, 2004) is more accurate (Domonkos et al., 2013). It is thus likely worthwhile to develop multiple breakpoint correction methods for daily data as well.

Finally, current daily correction methods rely on previously detected breaks and assume that the homogeneous subperiods (HSP) are homogeneous (i.e., each segment between breakpoints assume to be homogeneous) . However, these HSP are currently based on detection of breaks in the mean only. Breaks in higher moments may thus still be present in the "homogeneous" sub periods and affect the corrections. If only for this reason, we should also work on detection of breaks in the distribution.

Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 40 for decile corrections applied to every season, to a large number of DOF for quantile or percentile matching.

A study using PRODIGE on the HOME benchmark suggested that for typical European networks monthly adjustment are best for temperature; annual corrections are probably less accurate because they fail to account for changes in seasonal cycle due to inhomogeneities. For precipitation annual corrections were most accurate; monthly corrections were likely less accurate because the data was too noisy to estimate the 12 correction constants/degrees of freedom.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean could be sufficient, for a serious exposure problem (e.g., insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted. The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighbouring reference stations are.

An entire regional network is typically homogenised with the same correction method, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station, from break to break and from period to period. Work on correction methods that objectively select the optimal correction method, e.g., using an information criterion, would be valuable.

In case of (sub-)daily data, the options to select from become even larger. Daily data can be corrected just for inhomogeneities in the mean (e.g., Vincent et al., 2002, where daily temperatures are corrected by incorporating a linear interpolation scheme that preserves the previously defined monthly corrections) or also for the variability around the mean. In between are methods that adjust for the distribution including the seasonal cycle, which dominates the variability and is thus effectively similar to mean adjustments with a seasonal cycle. Correction methods of intermediate complexity with more than one, but less than 10 degrees of freedom would fill a gap and allow for more flexibility in selecting the optimal correction model.

When applying these methods (Della-Marta and Wanner, 2006; Wang et al., 2010; Mestre et al., 2011; Trewin, 2013) the number of quantile bins (categories) needs to be selected as well as whether to use physical weather-dependent predictors and the functional form they are used (Auchmann and Brönnimann, 2012). Objective optimal methods for these selections would be valuable.

Related information

WMO Guidelines on Homogenization (English, French, Spanish)

WMO guidance report: Challenges in the Transition from Conventional to Automatic Meteorological Observing Networks for Long-term Climate Records

Monday, April 27, 2020

Break detection is deceptive when the noise is larger than the break signal

I am disappointed in science. It is impossible that it took this long for us to discover that break detection has serious problems when the signal to noise ratio is low. However, as far as we can judge this was new science and it certainly was not common knowledge, which it should have been because it has large consequences.

This post describes a paper by Ralf Lindau and me about how break detection depends on the signal to noise ratio (Lindau and Venema, 2018). The signal in this case are the breaks we would like to detect. These breaks could be from a change in instrument or location of the station. We detect breaks by comparing a candidate station to a reference. This reference can be one other neighbouring station or an average of neighbouring stations. The candidate and reference should be sufficiently close so that they have the same regional climate signal, which is then removed by subtracting the reference from the candidate. The difference time series that is left contains breaks and noise because of measurement uncertainties and differences in local weather. The noise thus depends on the quality of the measurements, on the density of the measurement network and on how variable the weather is spatially.

The signal to noise ratio (SNR) is simply defined as the standard deviation of the time series containing only the breaks divided by the standard deviation of time series containing only the noise. For short I will denote these as the break signal and the noise signal, which have a break variance and a noise variance. When generating data to test homogenization algorithms, you know exactly how strong the break signal and the noise signal is. In case of real data, you can estimate it, for example with the methods I described in a previous blog post. In that study, we found a signal to noise ratio for annual temperature averages observed in Germany of 3 to 4 and in America of about 5.

Temperature is studied a lot and much of the work on homogenization takes place in Europe and America. Here this signal to noise ratio is high enough. That may be one reason why climatologists did not find this problem sooner. Many other sciences use similar methods, we are all supported by a considerable statistical literature. I have no idea what their excuses are.

Why a low SNR is a problem

As scientific papers go, the discussion is quite mathematical, but the basic problem is relatively easy to explain in words. In statistical homogenization we do not know in advance where the break or breaks will be. So we basically try many break positions and search for the break positions that result in the largest breaks (or, for the algorithm we studied, that explain the most variance).

If you do this for a time series that contains only noise, this will also produce (small) breaks. For example, in case you are looking for one break, due to pure chance there will be a difference between the averages of the first and the last segment. This difference is larger than it would be for a predetermined break position, as we try all possible break positions and then select the one with the largest difference. To determine whether the breaks we found are real, we require that they are so large that it is unlikely that they are due to chance, while there are actually no breaks in the series. So we study how large breaks are in series that only contains noise to determine how large such random breaks are. Statisticians would talk about the breaks being statistically significant with white noise as the null hypothesis.

When the breaks are really large compared to the noise one can see by eye where the positions of the breaks are and this method is nice to make this computation automatically for many stations. When the breaks are “just” large, it is a great method to objectively determine the number of breaks and the optimal break positions.

The problem comes when the noise is larger than the break signal. Not that it is fundamentally impossible to detect such breaks. If you have a 100-year time series with a break in the middle, you would be averaging over 50 noise values on either side and the difference in their averages would be much smaller than the noise itself. Even if noise and signal are about the same size the noise effect is thus expected to be smaller than the size of such a break. To put it in another way, the noise is not correlated in time, while the break signal is the same for many years; that fundamental difference is what the break detection exploits.

However, to come to the fundamental problem, it becomes hard to determine the positions of the breaks. Imagine the theoretical case where the break positions are fully determined by the noise, not by the breaks. From the perspective of the break signal, these break positions are random. The problem is, also random breaks explain a part of the break signal. So one would have a combination with a maximum contribution of the noise plus a part of the break signal. Because of this additional contribution by the break signal, this combination may have larger breaks than expected in a pure noise signal. In other words, the result can be statistically significant, while we have no idea where the positions of the breaks are.

In a real case the breaks look even more statistically significant because the positions of the breaks are determined by both the noise and the break signal.

That is the fundamental problem, the test for the homogeneity of the series rightly detects that the series contains inhomogeneities, but if the signal to noise ratio is low we should not jump to conclusions and expect that the set of break positions that gives us the largest breaks has much to do with the break positions in the data. Only if the signal to noise ratio is high, this relationship is close enough.

Some numbers

This is a general problem, which I expect all statistical homogenization algorithms to have, but to put some numbers on this, we need to specify an algorithm. We have chosen to study the multiple breakpoint method that is implemented in PRODIGE (Caussinus and Mestre, 2004), HOMER (Mestre et al., 2013) and ACMANT (Domonkos and Coll, 2017), these are among the best, if not the best, methods we currently have. We applied it by comparing pairs of stations, like PRODIGE and HOMER do.

For a certain number of breaks this method effectively computes the combination of breaks that has the highest break variance. If you add more breaks, you will increase the break variance those breaks explain, even if it were purely due to noise, so there is additionally a penalty function that depends on the number of breaks. The algorithm selects that option where the break variance minus such a penalty is highest. A statistician would call this a model selection problem and the job of the penalty is to keep the statistical model (the step function explaining the breaks) reasonably simple.

In the end, if the signal to noise ratio is one half, the breaks that explain the largest breaks are just as “good” at explaining the actual break signal in the data as breaks at random positions.

With this detection model, we derived the plot below, let me talk you through this. On the x-axis is the SNR, on the right the break signal is twice as strong as the noise signal. On the y-axis is how well the step function belonging to the detected breaks fits to the step function of the breaks we actually inserted. The lower curve, with the plus symbols, is the detection algorithm as I described above. You can see that for a high SNR it finds a solution that closely matches what we put in and the difference is almost zero. The upper curve, with the ellipse symbols, is for the solution you find if you put in random breaks. You can see that for a high SNR the random breaks have a difference of 0.5. As the variance of the break signal is one, this means that half the variance of the break signal is explained by random breaks.

Figure 13b from Lindau and Venema (2018).

When the SNR is about 0.5, the random breaks are about as good as the breaks proposed by the algorithm described above.

One may be tempted to think that if the data is too noisy, the detection algorithm should detect less breaks, that is, the penalty function should be bigger. However, the problem is not the detection of whether there are breaks in the data, but where the breaks are. A larger penalty thus does not solve the problem and even makes the results slightly worse. Not in the paper, but later I wondered whether setting more breaks is such a bad thing, so we also tried lowering the threshold, this again made the results worse.

So what?

The next question is naturally: is this bad? One reason to investigate correction methods in more detail, as described in my last blog post, was the hope that maybe accurate break positions are not that important. It could have been that the correction method still produces good results even with random break positions. This is unfortunately not the case, already quite small errors in break positions deteriorate the outcome considerably, this will be the topic of the next post.

Not homogenizing the data is also not a solution. As I described in a previous blog post, the breaks in Germany are small and infrequent, but they still have a considerable influence on the trends of stations. The figure below shows the trend differences between many pairs of nearby stations in Germany. Their differences in trends will be mostly due to inhomogeneities. The standard deviation of 0.628 °C per century for the pairs translated to an average error in the trends of individual stations of 0.4 °C per century.

The trend differences (y-axis) of pairs of stations (x-axis) in the German temperature network. The trends were computed from 316 nearby pairs over 1950 to 2000. Figure 2 from Lindau and Venema (2018).

This finding makes it more important to work on methods to estimate the signal to noise ratio of a dataset before we try to homogenize it. This is easier said than done. The method introduced in Lindau and Venema (2018) gives results for every pair of stations, but needs some human checks to ensure the fits are good. Furthermore, it assumes the break levels behave like noise, while in Venema and Lindau (2019) we found that the break signal in the USA behaves like a random walk. This 2019 method needs a lot of data, even the results for Germany are already quite noisy, if you apply it to data sparse regions you have to select entire continents. Doing so, however, biases the results to those subregions were the there are many stations and would thus give too high SNR estimates. So computing SNR worldwide is not just a blog post, but requires a careful study and likely the development of a new method to estimate the break and noise variance.

Both methods compute the SNR for one difference time series, but in a real case multiple difference time series are used. We will need to study how to do this in an elegant way. How many difference series are used depends on the homogenization method, this would also make the SNR method dependent. I would appreciate to also have an estimation method that is more universal and can be used to compare networks with each other.

This estimation method should then be applied to global datasets and for various periods to study which regions and periods have a problem. Temperature (as well as pressure) are variables that are well correlated from station to station. Much more problematic variables, which should thus be studied as well, are precipitation, wind, humidity. In case of precipitation, there tend to be more stations. This will compensate some, but for the other variables there may even be less stations.

We have some ideas how to overcome this problem, from ways to increase the SNR to completely different ways to estimate the influence of inhomogeneities on the data. But they are too preliminary to already blog about. Do subscribe to the blog with any of the options below the tag cloud near the end of the page. ;-)

When we digitize climate data that is currently only available on paper, we tend to prioritize data from regions and periods where we do not have much information yet. However, if after that digitization the SNR would still be low, it may be more worthwhile to digitize data from regions/periods where we already have more data and get that region/period to a SNR above one.

The next post will be about how this low SNR problem changes our estimates of how much the Earth has been warming. Spoiler: the climate “sceptics” will not like that post.

References

Caussinus, Henri and Olivier Mestre, 2004: Detection and correction of artificial shifts in climate series. The Journal of the Royal Statistical Society, Series C (Applied Statistics), 53, pp. 405-425. https://doi.org/10.1111/j.1467-9876.2004.05155.x

Domonkos, Peter and John Coll, 2017: Homogenisation of temperature and precipitation time series with ACMANT3: method description and efficiency tests. International Journal of Climatology, 37, pp. 1910-1921. https://doi.org/10.1002/joc.4822

Lindau, Ralf and Victor Venema, 2018: The joint influence of break and noise variance on the break detection capability in time series homogenization. Advances in Statistical Climatology, Meteorology and Oceanography, 4, p. 1–18. https://doi.org/10.5194/ascmo-4-1-2018

Lindau, R, Venema, V., 2019: A new method to study inhomogeneities in climate records: Brownian motion or random deviations? International Journal Climatology, 39: p. 4769– 4783. Manuscript: https://eartharxiv.org/vjnbd/ Article: https://doi.org/10.1002/joc.6105

Mestre, Olivier, Peter Domonkos, Franck Picard, Ingeborg Auer, Stephane Robin, Émilie Lebarbier, Reinhard Boehm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klancar, Brigitte Dubuisson, Petr Stepanek, 2013: HOMER: a homogenization software - methods and applications. IDOJARAS, Quarterly Journal of the Hungarian Meteorological Society, 117, no. 1, pp. 47–67.

Sunday, March 1, 2020

Trend errors in raw temperature station data due to inhomogeneities

Another serious title to signal this is again one for the nerds. How large is the uncertainty in the temperature trends of raw station data due to inhomogeneities? Too Long, Didn’t Read: they are big and larger in America than in Germany.

We came up with two very different methods (Lindau and Venema, 2020) to estimate this and we got lucky: the estimates match as well as one could expect.

Direct method

Let’s start simple, take pairs of stations and compute their difference time series, that is, we calculate the difference between two raw temperature time series. For each of these difference series you compute the trend and from all these trends you compute the variance.

If you compute these variances for pairs of stations in a range of distance classes you get the thick curves in the figure below for the United States of America (marked as U) and Germany (marked as G). This was computed based on data for the period 1901 to 2000 from the International Surface Temperature Initiative (ISTI) .

Figure 1. The variance of the trend differences computed from temperature data from the United States of America (U) and for Germany (G). Only the thick curves are relevant for this blog post and are used to compute the trend uncertainty. The paper used Kelvin [K], we could also have used degree Celsius [°C] like we do in this post.

When the distance between the pairs is small (near zero on the x-axis) the trend differences are mostly due to inhomogeneities, whereas when the distances get larger also real climate trend differences increase the variance of the trend differences. So we need to extrapolate the thick curves to a distance of zero to get the variance due to inhomogeneities.

For the USA this extrapolation gives about 1 °C² per century². This is the variance due to two stations. If the inhomogeneities are assumed to be independent, one station will have contributed half and the trend variance of one station is 0.5 °C² per century². To get an uncertainty measure that is easier to understand for humans, you can take the square root, which gives the standard deviation of the trends: 0.71 °C per century.

That is a decent size compared to the total warming over the last century of about 1.5 °C over land; see estimates below. This alone is a good reason to homogenize climate data to reduce such errors.

Figure 2. Warming estimates of the land surface air temperature from four different institutions. Figure taken from the last IPCC report.

The same exercise for Germany estimates the variance to be 0.5 °C² per century², the square root of half this variance gives a trend uncertainty of 0.5 °C per century.

In Germany the maximum distance for which we have a sufficient number of pairs (900 km) is naturally smaller than for America (over 1200 km). Interestingly, for America also real trend differences are important, which you can see in the trend variance increasing for pairs of stations that are further apart. In Germany this does not seem to happen even for the largest distances we could compute.

An important reason to homogenize climate data is to remove network-wide trend biases. When these trend biases are due to changes that affected all stations, they will hardly be visible in the difference time series. A good example is a change of the instruments affecting an entire observational network. It is also possible to have such large-scale trend biases due to rare, but big events, such as stations relocating from cities to airports, leading to a greatly reduced urban heat island effect. In such a case, the trend difference would be visible in the difference series and would thus be noticed by the above direct method.

Indirect method

The indirect method to estimate the station trend errors starts with an estimate of the statistical properties of the inhomogeneities and derives a relationship between these properties and the trend error.

Inhomogeneities

How the statistical properties of the inhomogeneities are estimated is described in Lindau and Venema (2019) and my previous blog post. To summarize, we had two statistical models for inhomogeneities. One where the size of the inhomogeneity between two breaks is given by a random number. We called this Random Deviations (RD) from a baseline. The second model is for breaks behaving like Brownian Motion (BM). Here the jump sizes are determined by random numbers. So the difference is whether the levels or the jumps are random numbers.

We found in both countries RD break with a typical jump size of about 0.5 °C. But the frequency was quite different, while in Germany we had one break every 24 years, in America it was once every 5.8 years.

Furthermore, in America there are also breaks that behave like Brownian Motion. For these break we only know the variance of the break multiplied by the frequency of the breaks, this is 0.45 °C² per century². We do not know not whether the value is due to many small breaks or one big one.

Relationship to trend errors

The next step is to relate these properties of the inhomogeneities to trend errors.

For the Random Deviation case, the dependence on the size of the breaks is trivial, it is simply proportional, but the dependence on the number of breaks is quite interesting. The numerical relationship is shown with +-symbols in the graph below.

Clearly when there are no breaks, there is also no trend error. On the other extreme of a large number of breaks, the error is due to a large number of independent random numbers, which to a large part cancel each other out. The largest trend errors are thus found for a moderate number of breaks.

To understand the result we start with the case without any variations in the break frequency. That is, if the break frequency is 5 breaks per century, every single 100-year time series has exactly 5 breaks. For this case we can derive an equation shown below as the thick curve. As expected it starts at zero and the maximum trend error is in case of 2 breaks in the time series.

More realistic is the case when there is a mean break frequency over all stations, but the number of breaks varies randomly per station. In case the breaks are independent of each other one would expect the number of breaks to follow a Poisson distribution. The thin lines in the graph below takes this scatter into account by computing a weighted average over the equation using Poisson weights. This smoothing reduces the height of the maximum and shifts it to a larger average break frequency, about 3 breaks per time series. Especially for more than 5 breaks, the numerical and analytical solutions fit very well.

Figure 3. The relationship between the variance of the trend due to inhomogeneities and the frequency of breaks, expressed as breaks per century. The plus-symbols are the results based on numerical simulation for 100-years time series. The thick line is the equation we found for a fixed break frequency, while the thin line takes into account random variations in the break frequency.

The next graph, shown below, also includes the case of Brownian Motion (BM), as well as the Random Deviation (RD) case. To make the BM and RD cases comparable, they both have jumps following a normal distribution with a variance of 1 °C². Clearly the Brownian Motion case (with O-symbols) produces much larger trend errors than the Random Deviations case (+-symbols).

Figure 4. The variance of the trend as a function of the frequency of breaks for the two statistical models. The O-symbols are for Brownian Motion, the +-symbols for Random Deviations. The variance of the jump sizes was 1 °C² in both cases.

That the variance of the trends due to inhomogeneities is a linear function of the number of breaks can be understood by considering that to a first approximation the trend error for Brownian Motion is given by a line connecting the first and the last segment of the break signal. If k is the number of breaks, the value of the last segment is the sum of k random values. Thus if the variance of one break is σ_β², the variance of the value of the last segment is k*σ_β² and thus a linear function of the number of breaks.

Alternatively you can do a lot of math and at the end find that the problem simplifies like a high school problem and that the actual trend error is 6/5 times the simple approximation from the previous paragraph.

Give me numbers

The variance of the trend error due to BM inhomogeneities in America is thus 6/5 times 0.45 °C² per century², which equals 0.54 °C² per century².

This BM trend error is a lot bigger than the trend error due to the RD inhomogeneities, which for 17.1 breaks per century and a break size distribution with variance 0.12 °C² is 0.13 °C² per century².

One can add these two variances together to get 0.67 °C² per century². The standard deviation of this trend error is thus quite big: 0.82 °C per century and mostly due to the BM component.

In Germany, we found only RD breaks with a frequency of 4.1 breaks per century. Their size is 0.13 °C². If we put this into the equation, the variance of the trends due to inhomogeneities is 0.34 °C² per century². Although the size of the RD breaks is about the same as in America, their influence on the station trend errors is larger, which is somewhat counter-intuitively because their number is lower. The standard deviation of the trend due to inhomogeneities is thus 0.58 °C per century in Germany.

Comparing the two estimates

Finally, we can compare the direct and indirect estimates of the trend errors. For America the direct (empirical) method found a trend error of 0.71 °C per century and the indirect (analytical) method 0.82 °C per century. For Germany the direct method found 0.5 °C per century and the indirect method 0.58 °C per century.

The indirect method thus found slightly larger uncertainties. Our estimates were based on the assumption of random station trend errors, which do not produce a bias in the trend. A difference in sensitivity to such biasing inhomogeneities in the observational data would be a reasonable explanation for these small differences. Also missing data may play a role.

Inhomogeneities can be very complicated. The break frequency does not have to be constant, the break sizes could depend on the year. Random Deviations and Brownian Motion are idealizations. In that light, it is encouraging that the direct and indirect estimates fit that well. These approximations seem to be sufficiently realistic, at least for the computation of station trend errors.

Monday, February 24, 2020

Estimating the statistical properties of inhomogeneities without homogenization

One way to study inhomogeneities is to homogenize a dataset and study the corrections made. However, that way you only study the inhomogeneities that have been detected. Furthermore, it is always nice to have independent lines of evidence in an observational science. So in this recently published study Ralf Lindau and I (2019) set out to study the statistical properties of inhomogeneities directly from the raw data.

Break frequency and break size

The description of inhomogeneities can be quite complicated.

Observational data contains both break inhomogeneities (jumps due to, for example, a change of instrument or location) and gradual inhomogeneities (for example, due to degradation of the sensor or the instrument screen, growing vegetation or urbanization). The first simplification we make is that we only consider break inhomogeneities. Gradual inhomogeneities are typically homogenized with multiple breaks and they are often quite hard to distinguish from actual multiple breaks in case of noisy data.

When it comes to the year and month of the break we assume every date has the same probability of containing a break. It could be that when there is a break, it is more likely that there is another break, or less likely that there is another break.* It could be that some periods have a higher probability of having a break or the beginning of a series could have a different probability or when there is a break in station X, there could be a larger chance of a break in station Y. However, while some of these possibilities make intuitively sense, we do not know about studies on them, so we assume the simplest case of independent breaks. The frequency of these breaks is a parameter our method will estimate.

* When you study the statistical properties of breaks detected by homogenization methods, you can see that around a break it is less likely for there to be another break. One reason for this is that some homogenization methods explicitly exclude the possibility of two nearby breaks. The methods that do allow for nearby breaks will still often prefer the simpler solution of one big break over two smaller ones.

When it comes to the sizes of the breaks we are reasonably confident that they follow a normal distribution. Our colleagues Menne and Williams (2005) computed the break sizes for all dates where the station history suggested something happened to the measurement that could affect its homogeneity.** They found the break size distribution plotted below. The graph compares the histogram to a normal distribution with an average of zero. Apart from the actual distribution not having a mean of zero (leading to trend biases) it seems to be a decent match and our method will assume that break sizes have a normal distribution.

Figure 1. Histogram of break sizes for breaks known from station histories (metadata).

** When you study the statistical properties of breaks detected by homogenization methods the distribution looks different; the graph plotted below is a typical example. You will not see many small breaks; the middle of the normal distribution is missing. This is because these small breaks are not statistically significant in a noisy time series. Furthermore, you often see some really large breaks. These are likely multiple breaks being detected as one big one. Using breaks known from the metadata, as Menne and Williams (2005) did, avoids or reduces these problems and is thus a better estimate of the distribution of actual breaks in climate data. Although, you can always worry that the breaks not known in the metadata are different. Science never ends.

Figure 2. Histogram of detected break sizes for the lower USA.

Temporal behavior

The break frequency and size is still not a complete description of the break signal, there is also the temporal dependence of the inhomogeneities. In the HOME benchmark I had assumed that every period between two breaks had a shift up or down determined by a random number, what we call “Random Deviation from a baseline” in the new article. To be honest, “assumed” means I had not really thought about it when generating the data. In the same year, NOAA published a benchmark study where they assumed that the jumps up and down (and not the levels) were given by a random number, that is, they assumed the break signal is a random walk. So we have to distinguish between levels and jumps.

This makes quite a difference for the trend errors. In case of Random Deviations, if the first jump goes up it is more likely that the next jump goes down, especially if the first jump goes up a lot. In case of a random walk or Brownian Motion, when the first jump goes up, this does not influence the next jump and it has a 50% probability of also going up. Brownian Motion hence has a tendency to run away, when you insert more breaks, the variance of the break signal keeps going up on average, while Random Deviations are bounded.

The figure from another new paper (Lindau and Venema, 2020) shown below quantifies the big difference this makes for the trend error of a typical 100 years long time series. On the x-axis you see the frequency of the breaks (in breaks per century) and on the y-axis the variance of the trends (in Kelvin² or Celsius² per century²) these breaks produce.

The plus-symbols are for the case of Random Deviations from a baseline. If you have exactly two breaks per time series this gives the largest trend error. However, because the number of breaks varies, an average break frequency of about three breaks per series gives the largest trend error. This makes sense as no breaks would give no trend error, while in case of more and more breaks you average over more and more independent numbers and the trend error becomes smaller and smaller.

The circle-symbols are for Brownian Motion. Here the variance of the trends increases linearly with the number of breaks. For a typical number of breaks of more than five, Brownian Motion produces a much larger trend error than Random Deviations.

Figure 3. Figure from Lindau and Venema (2020) quantifying the trend errors due to break inhomogeneities. The variance of the jump sizes is the same in both cases: 1 °C².

One of our colleagues, Peter Domonkos, also sometimes uses Brownian Motion, but puts a limit on how far it can run away. Furthermore, he is known for the concept of platform-like inhomogeneity pairs, where if the first break goes up, the next one is more likely to go down (or the other way around) thus building a platform.

All of these statistical models can make physical sense. When a measurement error causes the observation to go up (or down), once this problem is discovered it will go down (or up) again, thus creating a platform inhomogeneity pair. When the first break goes up (or down) because of a relocation, this perturbation remains when the the sensor is changed and both remain when the screen is changed, thus creating a random walk. Relocations are a frequent reason for inhomogeneities. When the station Bonn is relocated, the operator will want to keep it in the region, thus searching in a random directions around Bonn, rather than around the previous location. That would create Random Deviations.

In the benchmarking study HOME we looked at the sign of consecutive detected breaks (Venema et al., 2012). In case of Random Deviations, like HOME used for its simulated breaks, you would expect to get platform break pairs (first break up and the second down, or reversed) in 4 of 6 cases (67%). We detected them in 63% of the cases, a bit less, probably showing that platform pairs are a bit harder to detect than two breaks going in the same direction. In case of Brownian Motion you would expect 50% platform break pairs. For the real data in the HOME benchmark the percentage of platforms was 59%. So this does not fit to Brownian Motion, but is lower than you would expect from Random Deviations. Reality seems to be somewhere in the middle.

So for our new study estimating the statistical properties of inhomogeneities we opted for a statistical model where the breaks are described by a Random Deviations (RD) signal added to a Brownian Motion (BM) signal and estimate their parameters to see how large these two components are.

The observations

To estimate the properties of the inhomogeneities we have monthly temperature data from a large number of stations. This data has a regional climate signal, observational and weather noise and inhomogeneities. To separate the noise and the inhomogeneities we can use the fact that they are very different with respect to their temporal correlations. The noise will be mostly independent in time or weakly correlated in as far as measurement errors depend on the weather. The inhomogeneities, on the other hand, have correlations over many years.

However, the regional climate signal also has correlations over many years and is comparable in size to the break signal. So we have opted to work with a difference time series, that is, subtracting the time series of a neighboring station from that of a candidate station. This mostly removes the complicated climate signal and what remains is two times the inhomogeneities and two times the noise. The map below shows the 1459 station pairs we used for the USA.

Figure 4. Map of the lower USA with all the pairs of stations we used in this study.

For estimating the inhomogeneities, the climate signal is noise. By removing it we reduce the noise level and avoid having to make assumptions about the regional climate signal. There are also disadvantages to working with the difference series, inhomogeneities that are in both the candidate and the reference series will be (partially) removed. For example, when there is a jump because of the way the temperature is computed this leads to a change in the entire network***. Such a jump would be mostly invisible in a difference series. Although not fully invisible because the jump size will be different in every station.

*** In the past the temperature was read multiple times a day or a minimum and maximum temperature thermometer was used. With labor-saving automatic weather stations we can now sample the temperature many times a day and changing from one definition to another will give a jump.

Spatiotemporal differences

As test statistic we have chosen the variance of the spatiotemporal differences. The “spatio” part of the differences I already explained, we use the difference between two stations. Temporal differences mean we subtract two numbers separated by a time lag. For all pairs of stations and all possible pairs of values with a certain lag, we compute the variance of all these difference values and do this for lags of zero to 80 years.

In the paper we do all the math to show how the three components (noise, Random Deviation and Brownian Motion) depend on the lag. The noise does not depend on the lag. It is constant. Brownian Motion produces a linear increase of the variance as a function of lag, while the Random Deviations produce a saturating exponential function. How fast the function saturates is a function of the number of breaks per century.

The variance of the spatiotemporal differences for America is shown below. The O-symbols are the variances computed from the data. The other lines are the fits for the various parts of the statistical model. The variance of the noise is about 0.62 Kelvin² or Celsius² and shown as a horizontal line as it does not depend on the lag. The component of the Brownian Motion is the line indicated by BM, while the Random Deviation (RD) component is the curve starting at the origin and growing to about 0.47 K². From how fast this curve growths we estimate that the American data has one RD break every 5.8 years.

The curve for Brownian Motion being a line already suggests that it is not possible to estimate how many BM breaks the time series contains, we only know the total variance, but not whether it is contained in many small ones or one big one.

Figure 5. The variance of the spatiotemporal differences as a function of the time lag for the lower USA.

The situation for Germany is a bit different; see figure below. Here we do not see the continual linear increase in the variance we had above for America. Apparently the break signal in Germany does not have a significant Brownian Motion component and only contains Random Deviation breaks. The number of breaks is also much smaller, the German data only has one break every 24 years. The German weather service seems to give undisturbed climate observations a high priority.

For both countries the size of the RD breaks is about the same and quite small, expressed as typical jump size it would be about 0.5°C.

Figure 6. The variance of the spatiotemporal differences as a function of the time lag L for Germany.

The number of detected breaks

The number of breaks we found for America is a lot larger than the number of breaks detected by statistical homogenization. Typical numbers for detected breaks are one per 15 years for America and one per 20 years for Europe, although it also depends considerably on the homogenization method applied.

I was surprised by the large difference between actual breaks and detected breaks, I thought we would maybe miss 20 to 25% of the breaks. If you look at the histograms of the detected breaks, such as Figure 2 reprinted below, where the middle is missing, it looks as if about 20% is missing in a country with a dense observational network.

But these histograms are not a good way to determine what is missing. Next to the influence of chance, small breaks may be detected because they have a good reference station and other breaks are far away, while relatively big breaks may go undetected because of other nearby breaks. So there is not a clear cut-off and you would have to go far from the middle to find reliably detected breaks, which is where you get into the region where there are too many large breaks because detection algorithms combined two or more breaks into one. In other words, it is hard to estimate how many breaks are missing by fitting a normal distribution to the histogram of the detected breaks.

If you do the math, as we do in Section 6 of the article, it is perfectly possible not to detect half of the breaks even for a dense observational network.

Figure 2. Histogram of detected break sizes for the lower USA.

Final thoughts

This is a new methodology, let’s see how it holds when others look at it, with new methods, other assumptions about the nature of inhomogeneities and other datasets. Separating Random Deviations and Brownian Motion requires long series. We do not have that many long series and you can already see in the figures above that the variance of the spatiotemporal differences for Germany is quite noisy. The method thus requires too much data to apply it to networks all over the world.

In Lindau and Venema (2018) we introduced a method to estimate the break variance and the number of breaks for a single pair of stations (but not BM vs RD). This needed some human inspection to ensure the fits were right, but it does suggest that there may be a middle ground, a new method which can estimate these parameters for smaller amounts of data, which can be applied world wide.

The next blog post will be about the trend errors due to these inhomogeneities. If you have any questions about our work, do leave a comment below.

References

Lindau, R, Venema, V., 2020: Random trend errors in climate station data due to inhomogeneities. International Journal Climatology, 40, pp. 2393-2402. Open Access. https://doi.org/10.1002/joc.6340

Lindau, R, Venema, V., 2019: A new method to study inhomogeneities in climate records: Brownian motion or random deviations? International Journal Climatology, 39: p. 4769– 4783. Manuscript. https://eartharxiv.org/vjnbd/ https://doi.org/10.1002/joc.6105

Lindau, R. and Venema, V.K.C., 2018: The joint influence of break and noise variance on the break detection capability in time series homogenization. Advances in Statistical Climatology, Meteorology and Oceanography, 4, p. 1–18. https://doi.org/10.5194/ascmo-4-1-2018

Menne, M.J. and C.N. Williams, 2005: Detection of Undocumented Changepoints Using Multiple Test Statistics and Composite Reference Series. Journal of Climate, 18, 4271–4286. https://doi.org/10.1175/JCLI3524.1

Menne, M.J., C.N. Williams, and R.S. Vose, 2009: The U.S. Historical Climatology Network Monthly Temperature Data, Version 2. Bulletin American Meteorological Society, 90, 993–1008. https://doi.org/10.1175/2008BAMS2613.1

Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M.J. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma, 2012: Benchmarking homogenization algorithms for monthly data. Climate of the Past, 8, pp. 89-115. https://doi.org/10.5194/cp-8-89-2012

Wednesday, July 10, 2013

Statistical problems: The multiple breakpoint problem in homogenization and remaining uncertainties

This is part two of a series on statistically interesting problems in the homogenization of climate data. The first part was about the inhomogeneous reference problem in relative homogenization. This part will be about two problems: the multiple breakpoint problem and about computing the remaining uncertainties in homogenized data.

I hope that this series can convince statisticians to become (more) active in homogenization of climate data, which provides many interesting problems.

The five main statistical problems are:

Problem 1. The inhomogeneous reference problem: Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem: A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties: We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem: We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?: Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 2. The multiple breakpoint problem

For temperature time series about one break per 15 to 20 years is typical. Thus most interesting stations will contain more than one break. Unfortunately, most statistical detection methods have been developed for one break. To use them on series with multiple breaks, one ad-hoc solution is to first split the series at the largest break (for example the standard normalized homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution.

Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. Especially using the optimization methods “dynamic programming”. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.

This figure shows a screen shot of PRODIGE to homogenize Salzburg with its neighbors (click to enlarge). The neighbors are sorted based on their cross-correlation with Salzburg. The top panel is the difference time series of Salzburg with Kremsmünster, which has a standard deviation of 0.14°C. The middle panel is the difference between Salzburg and München (0.18°C). The lower panel is the difference of Salzburg and Innsbruck (0.29°C). Not having any experience with PRODIGE, I would read this graph as suggesting that Salzburg probably has breaks in 1902, 1938 and 1995. This fits to the station history. In 1903 the station was moved to another school. In 1939 it was relocated to the airport and in 1996 it was moved on the terrain of the airport. The other breaks are not consistently seen in multiple pairs and may thus well be in another station.

Pages

Monday, October 12, 2020

The deleted chapter of the WMO Guidance on the homogenisation of climate station data

Introduction

Physical understanding

Break detection

Computing uncertainties

Deterministic or stochastic corrections?

Correction as model selection problem

Related information

Monday, April 27, 2020

Break detection is deceptive when the noise is larger than the break signal

Why a low SNR is a problem

Some numbers

So what?

Other posts in this series

References

Sunday, March 1, 2020

Trend errors in raw temperature station data due to inhomogeneities

Direct method

Indirect method

Inhomogeneities

Relationship to trend errors

Give me numbers

Comparing the two estimates

Other posts in this series

References

Monday, February 24, 2020

Estimating the statistical properties of inhomogeneities without homogenization

Break frequency and break size

Temporal behavior

The observations

Spatiotemporal differences

The number of detected breaks

Final thoughts

Other posts in this series

References

Wednesday, July 10, 2013

Statistical problems: The multiple breakpoint problem in homogenization and remaining uncertainties

Problem 2. The multiple breakpoint problem

Most read posts

Search this blog

Blog archive

Follow blog feed