Station data

It is getting better, but much meteorological station data is unfortunately not freely available. This data has strategic and commercial value and many governments want to make money selling them. Often the data can be used for scientific research at little or no costs after making a contract with the national weather service. This, however, still makes producing collections and publishing added-value datasets more difficult.

Still there are large international collections of data. Normally both the raw data is available and the data after removing non-climatic changes (homogenization), which should be used for trend estimates. Please find below some links to surface temperature data. These sources often also provide other meteorological measurements.

If you are interested in a smaller region, it is a good idea to search for a national dataset. They are normally of a higher quality than global ones with respect to the removal of non-climatic changes (reliability of the trends) and the number of stations.

Monthly data

The National Center for Environmental Information (NCEI, previously known as NCDC) of NOAA hosts the Global Historical Climate Network (GHCNv3), which contains raw data and data homogenized with the Pairwise Homogenization Algorithm (PHA). This method is recommended for large datasets by the HOME community after their validation (benchmarking) study of homogenization methods.

The land data of the NASA Goddard Institute for Space Studies (NASA GISS) is nowadays based on the homogenized data of GHCNv3. GISS applies an additional correction for any remaining non-climatic changes due to urbanization. They also combine their data with another ocean dataset as NOAA to compute the global mean and have another method to compute the global mean.

The dataset of the Japan Meteorological Agency (JMA) is also based on GHCNv3. They update the most recent measurements themselves and also combine the data with their own ocean dataset, which is the main focus of their work.

A recent effort to compute a global mean temperature was made in Berkeley, the Berkeley Earth Surface Temperature (BEST) dataset. They start with a much larger number of stations and use a state-of-the-art interpolation method (kriging) to compute the global mean temperature. Both features make this dataset especially well suited for small-scale studies.

The UK Hadley Centre (part of the UK MetOffice) and the Climate Research Unit (CRU) have a global dataset called HadCRU. The land component comes from CRU and is called CURTEM4. CRU collects (homogenized) data mainly from (national) weather services and nowadays does not homogenize the data themselves any more.

The International Surface Temperature Initiative (ISTI) has started building a large global collection of temperature data. Currently the main dataset is monthly data, but a daily dataset will follow soon. The ISTI is committed to building a open database (data and processing) with good provenance and everyone is invited to collaborate. Currently the collection only contains raw data, scientists working on quality control and homogenization are invited to process this dataset. A benchmark dataset is being generated that will help to assess the accuracy of the homogenization methods. GHCNv4 will be based on the raw data of the ISTI.

Daily and sub-daily data

Daily data is harder to obtain than monthly data, but is important to study weather extremes and weather variability. A large collection of European data (and some nearby countries) can be found in the European Climate Assessment & Dataset Network (ECA&D). This data has been checked for homogeneity and reports on the number of non-climatic changes found, but they do not correct the daily data. Correcting daily data is a topic of ongoing research. ECA&D are trying to found similar initiatives around the world.

Even higher resolution data can be found in the HadISD dataset of the Hadley centre. It is based on the Integrated Surface Database (ISD) from NOAA's NCDC. HadISD is quality controlled (outliers removed) and checked for non-climatic changes using the PHA, but again no corrections are performed.

I have restricted myself to station data, many many more climate data links can be found at Tamino and RealClimate.

This overview was mostly written based on hearsay or recollection, I did not read all the corresponding papers (again). Thus the above description is just guidance and before you use the data please check the articles. If there are errors or unclarities, please leave a comment. Also if you know of other station data sources, do comment.

* Raw data is mostly as observed, but a small part is often already quality controlled or homogenized before it was collected.

No comments: