At the last meeting in Scotland, there were unfortunately no statisticians present in the parallel session on homogenization. This time it was a bit better. Still it seems as if homogenization is not seen as the interesting statistical problem it is. I hope that this post can convince some statisticians to become (more) active in homogenization of climate data, which provides many interesting problems.
As I see it, there are five problems for statisticians to work on. This post discusses the first one. The others will follow the coming days. UPDATE: they are now linked in the list below.
- Problem 1. The inhomogeneous reference problem
- Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
- Problem 2. The multiple breakpoint problem
- A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
- Problem 3. Computing uncertainties
- We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
- Problem 4. Correction as model selection problem
- We need objective selection methods for the best correction model to be used
- Problem 5. Deterministic or stochastic corrections?
- Current correction methods are deterministic. A stochastic approach would be more elegant
Problem 1. The inhomogeneous reference problem
Relative homogenization
Statisticians often work on absolute homogenization. In climatology relative homogenization methods, which utilize a reference time series, are almost exclusively used. Relative homogenization means comparing a candidate station with multiple neighboring stations (Conrad & Pollack, 1950).There are two main reasons for using a reference. Firstly, as the weather at two nearby stations is strongly correlated, this can take out a lot of weather noise and make it much easier to see small inhomogeneities. Secondly, it takes out the complicated regional climate signal. Consequently, it becomes a good approximation to assume that the difference time series (candidate minus reference) of two homogeneous stations is just white noise. Any deviation from this can then be considered as inhomogeneity.
The example with three stations below shows that you can see breaks more clearly in a difference time series (it only shows the noise reduction as no nonlinear trend was added). You can see a break in the pairs B-A and in C-A, thus station A likely has the break. This is confirmed by there being no break in the difference time series of C and B. With more pairs such an inference can be made with more confidence. For more graphical examples, see the post Homogenization for Dummies.
Figure 1. The temperature of all three stations. Station A has a break in 1940. Figure 2. The difference time series of all three pairs of stations.