Wednesday, 5 November 2014

Participate in the best validation study for daily homogenization algorithms

Rachel Warren is working on the validation of homogenization methods that remove non-climatic changes from the distribution of daily temperature data. Such methods are used to make trend estimates for changes in weather extremes and weather variability more accurate.

To study this, she has just released a numerical validation dataset. Everyone is invited to apply their homogenization method to this dataset. It looks to be the most realistic validation dataset produced up to now. Thus it promises to become an important paper for the homogenization community.

Rachel wrote about her study in a post at the blog of the benchmarking group of the International Surface Temperature Initiative. I hope it is okay that I republish it here below.

She is not the Rachel Warren of the Hip-Hop Dance Workout, that would be too much healthy fun for scientists, but Rachel Warren, the statistician from the University of Exeter. Hopefully the healthy smile on her photo makes up for the fun. And the interesting results of the study. VV

Release of a daily benchmark dataset - version 1
by Rachel Warren

Kate Willett's blog post from 6th October gives a detailed over-view of the benchmarking process that forms part of the ISTI's aims. It is hoped that in the long term these benchmarks will not only be produced at the monthly level, but also for daily data.

This post announces the release of a smaller daily benchmark dataset focusing on four regions in North America. These regions can be seen in Figure 1.

Figure 1 Station locations of the four benchmark regions. Blue stations are in all worlds. Red stations only appear in worlds 2 and 3.

These benchmarks have similar aims to the global benchmarks that are currently being produced by the ISTI working group, namely to:
  1. Assess the performance of current homogenisation algorithms and provide feedback to allow for their improvement
  2. Assess how realistic the created benchmarks are, to allow for improvements in future iterations
  3. Quantify the uncertainty that is present in data due to inhomogeneities both before and after homogenisation algorithms have been run on them

A perfect algorithm would return the inhomogeneous data to their clean form – correctly identifying the size and location of the inhomogeneities and adjusting the series accordingly. The inhomogeneities that have been added will not be made known to the testers until the completion of the assessment cycle – mid 2015. This is to ensure that the study is as fair as possible with no testers having prior knowledge of the added inhomogeneities.

The data are formed into three worlds, each consisting of the four regions shown in Figure 1. World 1 is the smallest and contains only those stations shown in blue in Figure 1, Worlds 2 and 3 are the same size as each other and contain all the stations shown.

Homogenisers are requested to prioritise running their algorithms on a single region across worlds instead of on all regions in a single world. This will hopefully maximise the usefulness of this study in assessing the strengths and weaknesses of the process. The order of prioritisation for the regions is Wyoming, South East, North East and finally the South West.

This study will be more effective the more participants it has and if you are interested in participating please contact Rachel Warren (rw307 AT The results will form part of a PhD thesis and therefore it is requested that they are returned no later than Friday 12th December 2014. However, interested parties who are unable to meet this deadline are also encouraged to contact Rachel.

There will be a further smaller release in the next week that is just focussed on Wyoming and will explore climate characteristics of data instead of just focusing on inhomogeneity characteristics.

No comments:

Post a Comment

Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.