Showing posts with label inhomogeneous reference. Show all posts
Showing posts with label inhomogeneous reference. Show all posts

Saturday, July 6, 2013

Five statistically interesting problems in homogenization. Part 1. The inhomogeneous reference problem

This is a series I have been wanting to write for a long time. The final push was last week's conference, the 12th International Meeting Statistical Climatology (IMSC), a very interesting meeting with an equal mix of statisticians and climatologists. (The next meeting in three years will be in the area of Vancouver, Canada, highly recommended.)

At the last meeting in Scotland, there were unfortunately no statisticians present in the parallel session on homogenization. This time it was a bit better. Still it seems as if homogenization is not seen as the interesting statistical problem it is. I hope that this post can convince some statisticians to become (more) active in homogenization of climate data, which provides many interesting problems.

As I see it, there are five problems for statisticians to work on. This post discusses the first one. The others will follow the coming days. UPDATE: they are now linked in the list below.
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 1. The inhomogeneous reference problem

Relative homogenization

Statisticians often work on absolute homogenization. In climatology relative homogenization methods, which utilize a reference time series, are almost exclusively used. Relative homogenization means comparing a candidate station with multiple neighboring stations (Conrad & Pollack, 1950).

There are two main reasons for using a reference. Firstly, as the weather at two nearby stations is strongly correlated, this can take out a lot of weather noise and make it much easier to see small inhomogeneities. Secondly, it takes out the complicated regional climate signal. Consequently, it becomes a good approximation to assume that the difference time series (candidate minus reference) of two homogeneous stations is just white noise. Any deviation from this can then be considered as inhomogeneity.

The example with three stations below shows that you can see breaks more clearly in a difference time series (it only shows the noise reduction as no nonlinear trend was added). You can see a break in the pairs B-A and in C-A, thus station A likely has the break. This is confirmed by there being no break in the difference time series of C and B. With more pairs such an inference can be made with more confidence. For more graphical examples, see the post Homogenization for Dummies.

Figure 1. The temperature of all three stations. Station A has a break in 1940.
Figure 2. The difference time series of all three pairs of stations.