Showing posts with label ACMANT. Show all posts
Showing posts with label ACMANT. Show all posts

Friday, 22 January 2021

New paper: Spanish and German climatologists on how to remove errors from observed climate trends

This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri (French) screen, in use in Spain and many European countries in the late 19th century and early 20th century. Leftmost, Stevenson screen equipped with conventional meteorological instruments, a set-up used globally for most of the 20th century. In the middle, Stevenson screen equipped with automatic sensors. The Montsouri screen is better ventilated, but because some solar radiation can get onto the thermometer it registers somewhat higher temperatures than a Stevenson screen. Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.

The instrumental climate record is human cultural heritage, the product of the diligent work of many generations of people all over the world. But changes in the way temperature was measured and in the surrounding of weather stations can produce spurious trends. An international team, with participation of the University Rovira i Virgili (Spain), State Meteorological Agency (AEMET, Spain) and University of Bonn (Germany), has made a great endeavour to provide reliable tests for the methods used to computationally eliminate such spurious trends. These so-called “homogenization methods“ are a key step to turn the enormous effort of the observers into accurate climate change data products. The results have been published in the prestigious Journal of Climate of the American Meteorological Society. The research was funded by the Spanish Ministry of Economy and Competitiveness.

Climate observations often go back more than a century, to times before we had electricity or cars. Such long time spans make it virtually impossible to keep the measurement conditions the same across time. The best-known problem is the growth of cities around urban weather stations. Cities tend to be warmer, for example due to reduced evaporation by plants or because high buildings block cooling. This can be seen comparing urban stations with surrounding rural stations. It is less talked about, but there are similar problems due to the spread of irrigation.

The most common reason for jumps in the observed data are relocations of weather stations. Volunteer observers tend to make observations near their homes; when they retire and a new volunteer takes over the tasks, this can produce temperature jumps. Even for professional observations keeping the locations the same over centuries can be a challenge either due to urban growth effects making sites unsuitable or organizational changes leading to new premises. Climatologist from Bonn, Dr. Victor Venema, one of the authors: “a quite typical organizational change is that weather offices that used to be in cities were transferred to newly build airports needing observations and predictions. The weather station in Bonn used to be on a field in village Poppelsdorf, which is now a quarter of Bonn and after several relocations the station is currently at the airport Cologne-Bonn.

For global trends, the most important changes are technological changes of the same kinds and with similar effects all over the world. Now we are, for instance, in a period with widespread automation of the observational networks.

Appropriate computer programs for the automatic homogenization of climatic time series are the result of several years of development work. They work by comparing nearby stations with each other and looking for changes that only happen in one of them, as opposed to climatic changes that influence all stations.

To scrutinize these homogenization methods the research team created a dataset that closely mimics observed climate datasets including the mentioned spurious changes. In this way, the spurious changes are known and one can study how well they are removed by homogenization. Compared to previous studies, the testing datasets showed much more diversity; real station networks also show a lot of diversity due to differences in their management. The researchers especially took care to produce networks with widely varying station densities; in a dense network it is easier to see a small spurious change in a station. The test dataset was larger than ever containing 1900 station networks, which allowed the scientists to accurately determine the differences between the top automatic homogenization methods that have been developed by research groups from Europe and the Americas. Because of the large size of the testing dataset, only automatic homogenization methods could be tested.

The international author group found that it is much more difficult to improve the network-mean average climate signals than to improve the accuracy of station time series.

The Spanish homogenization methods excelled. The method developed at the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain, by Hungarian climatologist Dr. Peter Domonkos was found to be the best at homogenizing both individual station series and regional network mean series. The method of the State Meteorological Agency (AEMET), Unit of Islas Baleares, Palma, Spain, developed by Dr. José A. Guijarro was a close second.

When it comes to removing systematic trend errors from many networks, and especially of networks where alike spurious changes happen in many stations at similar dates, the homogenization method of the American National Oceanic and Atmospheric Agency (NOAA) performed best. This is a method that was designed to homogenize station datasets at the global scale where the main concern is the reliable estimation of global trends.

The earlier used Open Screen used at station Uccle in Belgium, with two modern closed thermometer Stevenson screens with a double-louvred walls in the background.

Quotes from participating researchers

Dr. Peter Domonkos, who earlier was a weather observer and now writes a book about time series homogenization: “This study has shown the value of large testing datasets and demonstrates another reason why automatic homogenization methods are important: they can be tested much better, which aids their development.

Prof. Dr. Manola Brunet, who is the director of the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain, Visiting Fellow at the Climatic Research Unit, University of East Anglia, Norwich, UK and Vice-President of the World Meteorological Services Technical Commission said: “The study showed how important dense station networks are to make homogenization methods powerful and thus to compute accurate observed trends. Unfortunately, still a lot of climate data needs to be digitized to contribute to an even better homogenization and quality control.

Dr. Javier Sigró from the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain: “Homogenization is often a first step that allows us to go into the archives and find out what happened to the observations that produced the spurious jumps. Better homogenization methods mean that we can do this in a much more targeted way.

Dr. José A. Guijarro: “Not only the results of the project may help users to choose the method most suited to their needs; it also helped developers to improve their software showing their strengths and weaknesses, and will allow further improvements in the future.

Dr. Victor Venema: “In a previous similar study we found that homogenization methods that were designed to handle difficult cases where a station has multiple spurious jumps were clearly better. Interestingly, this study did not find this. It may be that it is more a matter of methods being carefully fine-tuned and tested.

Dr. Peter Domonkos: “The accuracy of homogenization methods will likely improve further, however, we never should forget that the spatially dense and high quality climate observations is the most important pillar of our knowledge about climate change and climate variability.

Press releases

Spanish weather service, AEMET: Un equipo internacional de climatólogos estudia cómo minimizar errores en las tendencias climáticas observadas

URV university in Tarragona, Catalonian: Un equip internacional de climatòlegs estudia com es poden minimitzar errades en les tendències climàtiques observades

URV university, Spanish: Un equipo internacional de climatólogos estudia cómo se pueden minimizar errores en las tendencias climáticas observadas

URV university, English: An international team of climatologists is studying how to minimise errors in observed climate trends

Articles

Tarragona 21: Climatòlegs de la URV estudien com es poden minimitzar errades en les tendències climàtiques observades

Genius Science, French: Une équipe de climatologues étudie comment minimiser les erreurs dans la tendance climatique observée

Phys.org: A team of climatologists is studying how to minimize errors in observed climate trend

 

Monday, 27 April 2020

Break detection is deceptive when the noise is larger than the break signal

I am disappointed in science. It is impossible that it took this long for us to discover that break detection has serious problems when the signal to noise ratio is low. However, as far as we can judge this was new science and it certainly was not common knowledge, which it should have been because it has large consequences.

This post describes a paper by Ralf Lindau and me about how break detection depends on the signal to noise ratio (Lindau and Venema, 2018). The signal in this case are the breaks we would like to detect. These breaks could be from a change in instrument or location of the station. We detect breaks by comparing a candidate station to a reference. This reference can be one other neighbouring station or an average of neighbouring stations. The candidate and reference should be sufficiently close so that they have the same regional climate signal, which is then removed by subtracting the reference from the candidate. The difference time series that is left contains breaks and noise because of measurement uncertainties and differences in local weather. The noise thus depends on the quality of the measurements, on the density of the measurement network and on how variable the weather is spatially.

The signal to noise ratio (SNR) is simply defined as the standard deviation of the time series containing only the breaks divided by the standard deviation of time series containing only the noise. For short I will denote these as the break signal and the noise signal, which have a break variance and a noise variance. When generating data to test homogenization algorithms, you know exactly how strong the break signal and the noise signal is. In case of real data, you can estimate it, for example with the methods I described in a previous blog post. In that study, we found a signal to noise ratio for annual temperature averages observed in Germany of 3 to 4 and in America of about 5.

Temperature is studied a lot and much of the work on homogenization takes place in Europe and America. Here this signal to noise ratio is high enough. That may be one reason why climatologists did not find this problem sooner. Many other sciences use similar methods, we are all supported by a considerable statistical literature. I have no idea what their excuses are.



Why a low SNR is a problem

As scientific papers go, the discussion is quite mathematical, but the basic problem is relatively easy to explain in words. In statistical homogenization we do not know in advance where the break or breaks will be. So we basically try many break positions and search for the break positions that result in the largest breaks (or, for the algorithm we studied, that explain the most variance).

If you do this for a time series that contains only noise, this will also produce (small) breaks. For example, in case you are looking for one break, due to pure chance there will be a difference between the averages of the first and the last segment. This difference is larger than it would be for a predetermined break position, as we try all possible break positions and then select the one with the largest difference. To determine whether the breaks we found are real, we require that they are so large that it is unlikely that they are due to chance, while there are actually no breaks in the series. So we study how large breaks are in series that only contains noise to determine how large such random breaks are. Statisticians would talk about the breaks being statistically significant with white noise as the null hypothesis.

When the breaks are really large compared to the noise one can see by eye where the positions of the breaks are and this method is nice to make this computation automatically for many stations. When the breaks are “just” large, it is a great method to objectively determine the number of breaks and the optimal break positions.

The problem comes when the noise is larger than the break signal. Not that it is fundamentally impossible to detect such breaks. If you have a 100-year time series with a break in the middle, you would be averaging over 50 noise values on either side and the difference in their averages would be much smaller than the noise itself. Even if noise and signal are about the same size the noise effect is thus expected to be smaller than the size of such a break. To put it in another way, the noise is not correlated in time, while the break signal is the same for many years; that fundamental difference is what the break detection exploits.

However, to come to the fundamental problem, it becomes hard to determine the positions of the breaks. Imagine the theoretical case where the break positions are fully determined by the noise, not by the breaks. From the perspective of the break signal, these break positions are random. The problem is, also random breaks explain a part of the break signal. So one would have a combination with a maximum contribution of the noise plus a part of the break signal. Because of this additional contribution by the break signal, this combination may have larger breaks than expected in a pure noise signal. In other words, the result can be statistically significant, while we have no idea where the positions of the breaks are.

In a real case the breaks look even more statistically significant because the positions of the breaks are determined by both the noise and the break signal.

That is the fundamental problem, the test for the homogeneity of the series rightly detects that the series contains inhomogeneities, but if the signal to noise ratio is low we should not jump to conclusions and expect that the set of break positions that gives us the largest breaks has much to do with the break positions in the data. Only if the signal to noise ratio is high, this relationship is close enough.

Some numbers

This is a general problem, which I expect all statistical homogenization algorithms to have, but to put some numbers on this, we need to specify an algorithm. We have chosen to study the multiple breakpoint method that is implemented in PRODIGE (Caussinus and Mestre, 2004), HOMER (Mestre et al., 2013) and ACMANT (Domonkos and Coll, 2017), these are among the best, if not the best, methods we currently have. We applied it by comparing pairs of stations, like PRODIGE and HOMER do.

For a certain number of breaks this method effectively computes the combination of breaks that has the highest break variance. If you add more breaks, you will increase the break variance those breaks explain, even if it were purely due to noise, so there is additionally a penalty function that depends on the number of breaks. The algorithm selects that option where the break variance minus such a penalty is highest. A statistician would call this a model selection problem and the job of the penalty is to keep the statistical model (the step function explaining the breaks) reasonably simple.

In the end, if the signal to noise ratio is one half, the breaks that explain the largest breaks are just as “good” at explaining the actual break signal in the data as breaks at random positions.

With this detection model, we derived the plot below, let me talk you through this. On the x-axis is the SNR, on the right the break signal is twice as strong as the noise signal. On the y-axis is how well the step function belonging to the detected breaks fits to the step function of the breaks we actually inserted. The lower curve, with the plus symbols, is the detection algorithm as I described above. You can see that for a high SNR it finds a solution that closely matches what we put in and the difference is almost zero. The upper curve, with the ellipse symbols, is for the solution you find if you put in random breaks. You can see that for a high SNR the random breaks have a difference of 0.5. As the variance of the break signal is one, this means that half the variance of the break signal is explained by random breaks.


Figure 13b from Lindau and Venema (2018).

When the SNR is about 0.5, the random breaks are about as good as the breaks proposed by the algorithm described above.

One may be tempted to think that if the data is too noisy, the detection algorithm should detect less breaks, that is, the penalty function should be bigger. However, the problem is not the detection of whether there are breaks in the data, but where the breaks are. A larger penalty thus does not solve the problem and even makes the results slightly worse. Not in the paper, but later I wondered whether setting more breaks is such a bad thing, so we also tried lowering the threshold, this again made the results worse.

So what?

The next question is naturally: is this bad? One reason to investigate correction methods in more detail, as described in my last blog post, was the hope that maybe accurate break positions are not that important. It could have been that the correction method still produces good results even with random break positions. This is unfortunately not the case, already quite small errors in break positions deteriorate the outcome considerably, this will be the topic of the next post.

Not homogenizing the data is also not a solution. As I described in a previous blog post, the breaks in Germany are small and infrequent, but they still have a considerable influence on the trends of stations. The figure below shows the trend differences between many pairs of nearby stations in Germany. Their differences in trends will be mostly due to inhomogeneities. The standard deviation of 0.628 °C per century for the pairs translated to an average error in the trends of individual stations of 0.4 °C per century.


The trend differences (y-axis) of pairs of stations (x-axis) in the German temperature network. The trends were computed from 316 nearby pairs over 1950 to 2000. Figure 2 from Lindau and Venema (2018).

This finding makes it more important to work on methods to estimate the signal to noise ratio of a dataset before we try to homogenize it. This is easier said than done. The method introduced in Lindau and Venema (2018) gives results for every pair of stations, but needs some human checks to ensure the fits are good. Furthermore, it assumes the break levels behave like noise, while in Venema and Lindau (2019) we found that the break signal in the USA behaves like a random walk. This 2019 method needs a lot of data, even the results for Germany are already quite noisy, if you apply it to data sparse regions you have to select entire continents. Doing so, however, biases the results to those subregions were the there are many stations and would thus give too high SNR estimates. So computing SNR worldwide is not just a blog post, but requires a careful study and likely the development of a new method to estimate the break and noise variance.

Both methods compute the SNR for one difference time series, but in a real case multiple difference time series are used. We will need to study how to do this in an elegant way. How many difference series are used depends on the homogenization method, this would also make the SNR method dependent. I would appreciate to also have an estimation method that is more universal and can be used to compare networks with each other.

This estimation method should then be applied to global datasets and for various periods to study which regions and periods have a problem. Temperature (as well as pressure) are variables that are well correlated from station to station. Much more problematic variables, which should thus be studied as well, are precipitation, wind, humidity. In case of precipitation, there tend to be more stations. This will compensate some, but for the other variables there may even be less stations.

We have some ideas how to overcome this problem, from ways to increase the SNR to completely different ways to estimate the influence of inhomogeneities on the data. But they are too preliminary to already blog about. Do subscribe to the blog with any of the options below the tag cloud near the end of the page. ;-)

When we digitize climate data that is currently only available on paper, we tend to prioritize data from regions and periods where we do not have much information yet. However, if after that digitization the SNR would still be low, it may be more worthwhile to digitize data from regions/periods where we already have more data and get that region/period to a SNR above one.

The next post will be about how this low SNR problem changes our estimates of how much the Earth has been warming. Spoiler: the climate “sceptics” will not like that post.


Other posts in this series

Part 5: Statistical homogenization under-corrects any station network-wide trend biases

Part 4: Break detection is deceptive when the noise is larger than the break signal

Part 3: Correcting inhomogeneities when all breaks are perfectly known

Part 2: Trend errors in raw temperature station data due to inhomogeneities

Part 1: Estimating the statistical properties of inhomogeneities without homogenization

References

Caussinus, Henri and Olivier Mestre, 2004: Detection and correction of artificial shifts in climate series. The Journal of the Royal Statistical Society, Series C (Applied Statistics), 53, pp. 405-425. https://doi.org/10.1111/j.1467-9876.2004.05155.x

Domonkos, Peter and John Coll, 2017: Homogenisation of temperature and precipitation time series with ACMANT3: method description and efficiency tests. International Journal of Climatology, 37, pp. 1910-1921. https://doi.org/10.1002/joc.4822

Lindau, Ralf and Victor Venema, 2018: The joint influence of break and noise variance on the break detection capability in time series homogenization. Advances in Statistical Climatology, Meteorology and Oceanography, 4, p. 1–18. https://doi.org/10.5194/ascmo-4-1-2018

Lindau, R, Venema, V., 2019: A new method to study inhomogeneities in climate records: Brownian motion or random deviations? International Journal Climatology, 39: p. 4769– 4783. Manuscript: https://eartharxiv.org/vjnbd/ Article: https://doi.org/10.1002/joc.6105

Mestre, Olivier, Peter Domonkos, Franck Picard, Ingeborg Auer, Stephane Robin, Émilie Lebarbier, Reinhard Boehm, Enric Aguilar, Jose Guijarro, Gregor Vertachnik, Matija Klancar, Brigitte Dubuisson, Petr Stepanek, 2013: HOMER: a homogenization software - methods and applications. IDOJARAS, Quarterly Journal of the Hungarian Meteorological Society, 117, no. 1, pp. 47–67.