Showing posts with label IMSC. Show all posts
Showing posts with label IMSC. Show all posts

Thursday, December 11, 2014

Meetings for fans of homogenisation

There are a number of scientific meetings coming up for people interested in the homogenisation of climate station data.

CLIMATE-ES 2015

The International Symposium CLIMATE-ES 2015 (Progress on climate change detection and projections over Spain since the findings of the IPCC AR5 Report.) will be held in Tortosa, Tarragona, Spain, on 11-13 March 2015 and is organised by Manola Brunet et al.

Deadline for abstract submission and registration is in four days: 15 December 2014.

There is a session on Climatic observations and instrumental reconstructions: the development of high-quality climate time-series, gridded products and data assimilation techniques. Chaired by José Antonio Guijarro.

EGU2015

Three sessions at the general assembly of the European Geophysical Union (EGU) are interesting for us.

Climate Data Homogenization and Climate Trend and Variability Assessment by Xiaolan Wang et al.
... This session calls for contributions that are related to bias correction and homogenization of climate data, including bias correction and validation of various climate data from satellite observations and from GCM and RCM simulations, as well as quality control/assurance of observations of various variables in the Earth system. It also calls for contributions that use high quality, homogeneous climate data to assess climate trends and variability and to analyze climate extremes, including the use of bias-corrected GCM or RCM simulations in statistical downscaling. This session will include studies that inter-compare different techniques and/or propose new techniques/algorithms for bias-correction and homogenization of climate data, for assessing climate trends and variability and analysis of climate extremes (including all aspects of time series analysis), as well as studies that explore the applicability of techniques/algorithms to data of different temporal resolutions (annual, monthly, daily¦) and of different climate elements (temperature, precipitation, pressure, wind, etc) from different observing network characteristics/densities, including various satellite observing systems.

Bridging the gap between observations, reconstructions and simulations for the early instrumental period by Oliver Bothe et. al.
The early instrumental period, covering the late 18th century and the 19th century, was characterized by prominent external climate forcing perturbations, including but not limited to, the Dalton minimum of solar activity and strong volcanic eruptions (e.g., 1783/84 Laki, 1809 eruption at unknown location, 1815 Tambora, 1835 Cosigüina, 1883 Krakatoa). Climate conditions during this period are illustrated by many environmental archives of climate variability as well as by documentary sources and sparse instrumental observations available from various regions. The peculiar characteristics of this period also stimulated research based on numerical climate models. Beyond their direct impact, the external perturbations likely left longer term imprints on the climate system which might be unrepresented in the initial conditions of the historical simulations (1850 - today), thus affecting their reliability. ...

We invite submissions addressing climate variability of the early instrumental period, especially on works combining or contrasting different sources of information to highlight or overcome differences in our estimates about the climate of this period. Contributions aiming at exploring the role of the external forcing in climate variations during the period of interest are specially acknowledged. This includes new estimates about climate variability and forcing in this period. Furthermore, we welcome more general submissions about the long term imprints of episodes with strong natural forcing comparable to that in the early instrumental period.

Taking the temperature of the Earth: Temperature Variability and Change across all Domains of Earth's Surface by Stephan Matthiesen et al.
The overarching motivation for this session is the need for better understanding of in-situ measurements and satellite observations to quantify surface temperature (ST). The term "surface temperature" encompasses several distinct temperatures that differently characterize even a single place and time on Earth’s surface, as well as encompassing different domains of Earth’s surface (surface air, sea, land, lakes and ice). Different surface temperatures play inter-connected yet distinct roles in the Earth’s surface system, and are observed with different complementary techniques.

There is a clear need and appetite to improve the interaction of scientists across the in-situ/satellite 'divide' and across all domains of Earth's surface. This will accelerate progress in improving the quality of individual observations and the mutual exploitation of different observing systems over a range of applications. ...

The deadline for receipt of abstracts is 7 January 2015, and abstracts can be submitted through the session website.

10th EUMETNET Data Management Workshop

Just a pre-announcement, the next Data Management Workshop will be in St. Gallen, Switzerland on 28th-30th October 2015. Save the date in your agenda. Further announcements will follow later by Ingeborg Auer.

IMSC2016

Even further into the future, is the 13th International Meeting on Statistical Climatology in 2016 (IMSC2016), Vancouver, Canada. I guess the date itself is not fixed yet. Previous IMSC's were very interesting. The still empty page to bookmark.

More

Did I miss any upcoming meetings or other news? Please add them in the comments.

Friday, July 19, 2013

Statistically interesting problems: correction methods in homogenization

This is the last post in a series on five statistically interesting problems in the homogenization of climate network data. This post will discuss two problems around the correction methods used in homogenization. Especially the correction of daily data is becoming an increasingly important problem because more and more climatologist work with daily climate data. The main added value of daily data is that you can study climatic changes in the probability distribution, which necessitates studying the non-climatic factors (inhomogeneities) as well. This is thus a pressing, but also a difficult task.

The five main statistical problems are:
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 4. Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 120 for decile corrections (for the higher order moment method (HOM) for daily data, Della-Marta & Wanner, 2006) applied to every month, to a large number of DOF for quantile or percentile matching.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean would be sufficient, for a serious exposure problem (e.g. insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted.

The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighboring reference stations are.

Currently climatologists choose their correction method mainly subjectively. For precipitation annual correction are typically applied and for temperature monthly correction are typical. The HOME benchmarking study showed these are good choices. For example, an experimental contribution correcting precipitation on a monthly scale had a larger error as the same method applied on the annual scale because the data did not allow for an accurate estimation of 12 monthly correction constants.

One correction method is typically applied to the entire regional network, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station and from break to break. Especially in global studies, the number of stations in a region and thus the signal to noise ratio varies widely and one fixed choice is likely suboptimal. Studying which correction method is optimal for every break is much work for manual methods, instead we should work on automatic correction methods that objectively select the optimal correction method, e.g., using an information criterion. As far as I know, no one works on this yet.

Problem 5. Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenization. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

Wednesday, July 10, 2013

Statistical problems: The multiple breakpoint problem in homogenization and remaining uncertainties

This is part two of a series on statistically interesting problems in the homogenization of climate data. The first part was about the inhomogeneous reference problem in relative homogenization. This part will be about two problems: the multiple breakpoint problem and about computing the remaining uncertainties in homogenized data.

I hope that this series can convince statisticians to become (more) active in homogenization of climate data, which provides many interesting problems.

The five main statistical problems are:
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 2. The multiple breakpoint problem

For temperature time series about one break per 15 to 20 years is typical. Thus most interesting stations will contain more than one break. Unfortunately, most statistical detection methods have been developed for one break. To use them on series with multiple breaks, one ad-hoc solution is to first split the series at the largest break (for example the standard normalized homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution.

Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. Especially using the optimization methods “dynamic programming”. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.


This figure shows a screen shot of PRODIGE to homogenize Salzburg with its neighbors (click to enlarge). The neighbors are sorted based on their cross-correlation with Salzburg. The top panel is the difference time series of Salzburg with Kremsmünster, which has a standard deviation of 0.14°C. The middle panel is the difference between Salzburg and München (0.18°C). The lower panel is the difference of Salzburg and Innsbruck (0.29°C). Not having any experience with PRODIGE, I would read this graph as suggesting that Salzburg probably has breaks in 1902, 1938 and 1995. This fits to the station history. In 1903 the station was moved to another school. In 1939 it was relocated to the airport and in 1996 it was moved on the terrain of the airport. The other breaks are not consistently seen in multiple pairs and may thus well be in another station.

Saturday, July 6, 2013

Five statistically interesting problems in homogenization. Part 1. The inhomogeneous reference problem

This is a series I have been wanting to write for a long time. The final push was last week's conference, the 12th International Meeting Statistical Climatology (IMSC), a very interesting meeting with an equal mix of statisticians and climatologists. (The next meeting in three years will be in the area of Vancouver, Canada, highly recommended.)

At the last meeting in Scotland, there were unfortunately no statisticians present in the parallel session on homogenization. This time it was a bit better. Still it seems as if homogenization is not seen as the interesting statistical problem it is. I hope that this post can convince some statisticians to become (more) active in homogenization of climate data, which provides many interesting problems.

As I see it, there are five problems for statisticians to work on. This post discusses the first one. The others will follow the coming days. UPDATE: they are now linked in the list below.
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 1. The inhomogeneous reference problem

Relative homogenization

Statisticians often work on absolute homogenization. In climatology relative homogenization methods, which utilize a reference time series, are almost exclusively used. Relative homogenization means comparing a candidate station with multiple neighboring stations (Conrad & Pollack, 1950).

There are two main reasons for using a reference. Firstly, as the weather at two nearby stations is strongly correlated, this can take out a lot of weather noise and make it much easier to see small inhomogeneities. Secondly, it takes out the complicated regional climate signal. Consequently, it becomes a good approximation to assume that the difference time series (candidate minus reference) of two homogeneous stations is just white noise. Any deviation from this can then be considered as inhomogeneity.

The example with three stations below shows that you can see breaks more clearly in a difference time series (it only shows the noise reduction as no nonlinear trend was added). You can see a break in the pairs B-A and in C-A, thus station A likely has the break. This is confirmed by there being no break in the difference time series of C and B. With more pairs such an inference can be made with more confidence. For more graphical examples, see the post Homogenization for Dummies.

Figure 1. The temperature of all three stations. Station A has a break in 1940.
Figure 2. The difference time series of all three pairs of stations.