Showing posts with label climate data. Show all posts
Showing posts with label climate data. Show all posts

Saturday, December 24, 2016

Can Trump fiddle with climate observations?

Some people worry about the Trump administration fiddling with climate data to get politically correct trends. There is a lot to worry about. This is not one of them.

Raw data

A Trump stooge could not fiddle with the raw data because there are many organisations that also have a copy. Old data can be found in the annual reports of the weather services. New data in their databases and in many archives that collect the observations that weather services share with each other, the so-called CLIMAT messages every month (for climate purposes) and GTS messages every day (for meteorology).

Nick Stokes checked how station data moves from the Australian Bureau of Meteorology (BOM) to NOAA's Global Historical Climate Network (GHCN). Spoiler: it fits. The marine observations by voluntary observing vessels are less open to the public due to piracy concerns, but this is just a small part of the marine data nowadays and regional data managers can check whether everything fits (Freeman et al., 2016).

Because climate data needs to be consistent, a lot of data would need to be changed. If only a few stations were changed these would be different from their neighbours and as such identified as faulty. Thus to find any fiddling of the raw data only a small number of stations needs to be sampled.

Who fiddles the fiddlers?

The raw data is processed to estimate the global (and regional) climatic changes from it. The temperature change in the raw data and the actual estimate of the temperature increase is shown in the graph below. The actual temperature increase was smaller than the one of the raw data. The main reason is that old sea surface temperature measurements were made with buckets and the water in the bucket would cool a little due to evaporation before reading the thermometer.



Theoretically a Trump stooge could mess with this data processing. However, the software is open. Thus everyone can check whether the code produces the claimed results when applied to the raw data. The changes would thus have to be made in the open and justified.

The Trump stooge could naturally openly make changes to the code and claim that this "improves" the data processing. Whether the new software is actually an improvement is, however, something we can check. For the land station data we have a validation dataset where we know the climate signal we put in and the measurement artefacts we put in and can thus see how well the software removes the artefacts. The current homogenization software of NOAA removes these measurement artefacts well. If the software is fiddled with for political reasons, it will perform worse.

If that happens I am sure someone will be willing to apply the better original code to the raw data and publish these results. That only requires modest software skills.

Signs of clear fiddling

Apart from such audits larger changes would also be obvious because data needs to be consistent with each other. Land surface temperature, sea surface temperature and upper air temperature, for example, need to fit together. Marine temperatures from ships, drifting buoys, moored coastal buoys and [[ARGO]] need to fit together. Pressure will need to fit to wind, the circulation to precipitation, precipitation to snow cover, snow cover to reflectance, reflectance to incoming radiation and absorption. The changes in the physical climate would need to fit to the changes observed by biologists and bird spotters in nature, to changes noticed by agricultural scientists, economists and farmers in yields, to changes seen by glaciologists in glaciers and ice caps, to changes measured by hydrologists in stream flows.

It is easier to go to the moon than to fake the moon landing in Hollywood. It is easier to fake the moon landing than to make significant changes to climate data without being caught.

Destruction of data

Thus with some vigilance the data we have will be okay. What is worrying is the possible destruction of datasets and the discontinuing of measurements. Trump's election has shown that catastrophes with less than 50% chance do happen. Climate data is part of our cultural and scientific heritage and important to protect communities. Thus we should not take any risks with them.

Destroying data would put American communities in more danger, but the Trump administration may not care. For instance, Florida’s Republican government banned state employees from discussing global warming. That hinders adaptation to climate change. Republican North Carolina legislators voted to ignore sea-level rise projections, putting citizens at a higher risk of drowning, endangering infrastructure and leading to higher adaptation costs later on. Several Republican politicians have wasted taxpayer money to harass climate scientists in return for campaign contributions.


Dumpster in Quebec with hundreds of carelessly discarded historic books and documents.
The conservative Harper government in Canada committed libricide and destroyed seven environmental libraries and threw the books on the trash heap.

Also what has not happened before can happen. The radicalised Congress has shown disregard of the American public by shutting down the government. In the election campaign Trump called for violence to quell protest and to lock up his opponent. An alt-nazi will be advisor in the White House. Never before have so many banks and oil companies had a seat at the tables of power. This is the first time that a foreign power was forced to move a celebration to the hotel of the president-elect. Presidents normally do not have hotels in Washington DC that all diplomats will use to gain favours. Trump will be the first president with a 300 million dollar loan from a foreign bank he is supposed to regulate. This list could be longer than this post. Do not be fooled that this is normal.

If a Trump stooge would order the deletion of a dataset also the backups would be deleted. Thus it is good that independent initiatives have sprung up to preserve digital archives. I hope and trust that all American scientists will make sure that there are copies of their data and code on private disks and in foreign countries.

Unfortunately not all data is digitised or digitisable, many documents still need to be scanned, proxy sources such as (ice) cores and tree rings contain information that has not been measured yet or needs future technologies to measure. Some of these ice cores come from glaciers that no longer exist.

Observations could be stopped. Even if they would be continued again after four years, the gap would limit our ability to see changes and thus to adapt to climate change and limit the damages. Looking at the proposed members of the Trump cabinet, I fear that such damages and costs for American citizens will not stop them. I hope that the blue states and Europe will be willing to pick up the tab until decency is restored and is prepared to move fast when needed. At a scientific conference in San Francisco Jerry Brown, Governor of California, promised earlier this month that "if Trump turns off the earth monitoring satellites California will launch its own damn satellites." A hopeful sign in the face of Washington fundamentalism.


Related reading

The Center for Science and Democracy at the Union of Concerned Scientists has established a hotline for National Oceanic and Atmospheric Administration (NOAA) employees to report political meddling

How Trump’s White House Could Mess With Government Data. 538 on how the Trump administration could fiddle with other (economic) datasets and especially affect how the information is communicated

A chat with Gavin Schmidt of NASA-GISS on why climate data is mostly save and the legal protections for federal scientists communicating science

Just the facts, homogenization adjustments reduce global warming

Statistical homogenisation for dummies

Benchmarking homogenisation algorithms for monthly data

Brady Dennis for The Washington Post: Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump

Canadian CBC radio on Harper's carbon government attack on science: Science Under Siege

On the cuts to Canadian science and observational capabilities under Harper. Academic Matters: Harper’s attack on science: No science, no evidence, no truth, no democracy

On Harper's destruction of libraries: The Harper Government Has Trashed and Destroyed Environmental Books and Documents

In Florida, officials ban term 'climate change'

New Law in North Carolina Bans Latest Scientific Predictions of Sea-Level Rise

References

Freeman, E., Woodruff, S. D., Worley, S. J., Lubker, S. J., Kent, E. C., Angel, W. E., Berry, D. I., Brohan, P., Eastman, R., Gates, L., Gloeden, W., Ji, Z., Lawrimore, J., Rayner, N. A., Rosenhagen, G. and Smith, S. R., 2016: ICOADS Release 3.0: a major update to the historical marine climate record. Int. J. Climatol., doi: 10.1002/joc.4775

Saturday, June 13, 2015

Free our climate data - from Geneva to Paris

Royal Air Force- Italy, the Balkans and South-east Europe, 1942-1945. CNA1969

Neglecting to monitor the harm done to nature and the environmental impact of our decisions is only the most striking sign of a disregard for the message contained in the structures of nature itself.
Pope Francis

The 17th Congress of the World Meteorological Organization in Geneva ended today. After countless hours of discussions they managed to pass a almost completely rewritten resolution on sharing climate data in the last hour.

The glass is half full. On the one hand, the resolution clearly states the importance of sharing data. It demonstrates that it is important to help humanity cope with climate change by making it part of the global framework for climate services (GFCS), which is there to help all nations to adapt to climate change.

The resolution considers and recognises:
The fundamental importance of the free and unrestricted exchange of GFCS relevant data and products among WMO Members to facilitate the implementation of the GFCS and to enable society to manage better the risks and opportunities arising from climate variability and change, especially for those who are most vulnerable to climate-related hazards...

That increased availability of, and access to, GFCS relevant data, especially in data sparse regions, can lead to better quality and will create a greater variety of products and services...

Indeed free and unrestricted access to data can and does facilitate innovation and the discovery of new ways to use, and purposes for, the data.
On the other hand, if a country wants to it can still refuse to share the most important datasets: the historical station observations. Many datasets will be shared: Satellite data and products, ocean and cryosphere (ice) observations, measurements on the composition of the atmosphere (including aerosols). However, information on streamflow, lakes and most of the climate station data are exempt.

The resolution does urge Members to:
Strengthen their commitment to the free and unrestricted exchange of GFCS relevant data and products;

Increase the volume of GFCS relevant data and products accessible to meet the needs for implementation of the GFCS and the requirements of the GFCS partners;
But there is no requirement to do so.

The most positive development is not on paper. Data sharing may well have been the main discussion topic among the directors of the national weather services at the Congress. They got the message that many of them find this important and they are likely to prioritise data sharing in future. I am grateful to the people at the WMO Congress who made this happen, you know who you are. Some directors really wanted to have a strong resolution as justification towards their governments to open up the databases. There is already a trend towards more and more countries opening up their archives, not only of climate data, but going towards open governance. Thus I am confident that many more countries will follow this trend after this Congress.

Also good about the resolution is that WMO will start monitoring data availability and data policies. This will make visible how many countries are already taking the high road and speed up the opening of the datasets. The resolution requests WMO to:
Monitor the implementation of policies and practices of this Resolution and, if necessary, make proposals in this respect to the Eighteenth World Meteorological Congress;
In a nice twist the WMO calls the data to be shared: GFCS data. Thus basically saying, if you do not share climate data you are responsible for the national damages of climatic changes that you could have adapted to and you are responsible for the failed adaptation investments. The term "GFSC data" misses how important this data is for basic climate research. Research that is important to guide expensive political decisions on mitigation and in the end again adaptation and ever more likely geo-engineering.

If I may repeat myself, we really need all the data we can get for an accurate assessment of climatic changes, a few stations will not do:
To reduce the influence of measurement errors and non-climatic changes (inhomogeneities) on our (trend) assessments we need dense networks. These errors are detected and corrected by comparing one station to its neighbours. The closer the neighbours are, the more accurate we can assess the real climatic changes. This is especially important when it comes to changes in severe and extreme weather, where the removal of non-climatic changes is very challenging.
The problem, as so often, is mainly money. Weather services get some revenues from selling climate data. These can't be big compared to the impacts of climate change or compared to the investments needed to adapt, but relative to the budget of a weather service, especially in poorer countries, it does make a difference. At least the weather services will have to ask their governments for permission.

Thus we will probably have to up our game. The mandate of the weather services is not enough, we need to make clear to the governments of this world that sharing climate data is of huge benefit to every single country. Compared to the costs of climate change this is a no-brainer. Don Henry writes that "[The G7] also said they would continue efforts to provide US$100 billion a year by 2020 to support developing countries' own climate actions." The revenues from selling climate data are irrelevant compared to that number.

There is just a large political climate summit coming up, the COP21 in Paris in December. This week there was a preparatory meeting in Bonn to work on the text of the climate treaty. This proposal already has an optional text about climate research:
[Industrialised countries] and those Parties [nations] in a position to do so shall support the [Least Developed Countries] in the implementation of national adaptation plans and the development of additional activities under the [Least Developed Countries] work programme, including the development of institutional capacity by establishing regional institutions to respond to adaptation needs and strengthen climate-related research and systematic observation for climate data collection, archiving, analysis and modelling.
An earlier climate treaty (COP4 from 1998) already speaks about the exchange of climate data (FCCC/CP/1998/16/Add.1):
Urges Parties to undertake free and unrestricted exchange of data to meet the needs of the Convention, recognizing the various policies on data exchange of relevant international and intergovernmental organizations;
"Urges" is not enough, but that is a basis that could be reinforced. With the kind of money COP21 is dealing with it should be easy to support weather services of less wealthy countries to improve their observation systems and make the data freely available. That would be an enormous win-win situation.

To make this happen, we probably need to show that the climate science community stands behind this. We would need a group of distinguished climate scientists from as much countries as possible to support a "petition" requesting better measurements in data sparse regions and free and unrestricted data sharing.

To get heard we would probably also need to write articles for the national newspapers, to get published they would again have to be written by well-known scientists. To get attention it would also be great if many climate blogs would write about the action on the same day.

Maybe we could make this work. My impression was already that basically everyone in the climate science community finds the free exchange of climate data very important and the current situation a major impediment for better climate research. After last weeks article on data sharing the response was enormous and only positive. This may have been the first time that a blog post of mine that did not respond to something in the press got over 1000 views. It was certainly my first tweet that got over 13 thousand views and 100 retweets:


This action of my little homogenization blog was even in the top of the twitter page on the Congress of the WMO (#MeteoWorld), right next to the photo of the newly elected WMO Secretary-General Petteri Taalas.



With all this internet enthusiasm and the dedication of the people fighting for free data at the WMO and likely many more outside of the WMO, we may be able to make this work. If you would like to stay informed please fill in the form below or write to me. If enough people show interest, I feel we should try. I also do not have the time, but this is important.






Related reading

Congress of the World Meteorological Organization, free our climate data

Why raw temperatures show too little global warming

Everything you need to know about the Paris climate summit and UN talks

Bonn climate summit brings us slowly closer to a global deal by Don Henry (Public Policy Fellow, Melbourne Sustainable Society Institute at University of Melbourne) at The Conversation.

Free climate data action promoted in Italian. Thank you Sylvie Coyaud.

If my Italian is good enough, that is Google Translate, this post wants the Pope to put the sharing of climate data in his encyclical. Weather data is a common good.


* Photo at the top: By Royal Air Force official photographer [Public domain], via Wikimedia Commons

Wednesday, June 3, 2015

Congress of the World Meteorological Organization, free our climate data



A small revolution happens at the World Meteorological Organization (WMO). Its main governing body (WMO congress) is discussing a draft resolution that national weather services shall provide free and unrestricted access to climate data. The problem is the fine print. The fine print makes it possible to keep on refusing to share important climate data with each other.

The data situation is getting better, more and more countries are freeing their climate data. The USA, Canada and Australia have a long traditions. Germany, The Netherlands, Finland, Sweden, Norway, Slovenia, Brazil and Israel have just freed their data. China and Russia are pretty good with sharing data. Switzerland has concrete plans to free their data. I probably forgot many countries and for Israel you currently still have to be able to read Hebrew, but things are definitely improving.

That there are large differences between countries is illustrated by this map of data availability for daily mean temperature data in the ECA&D database, a dataset that is used to study changes in severe weather. The green dots are data where you can download and work with the station data, the red dots are data that ECA&D are only allowed to use internally to make maps. In the number of stations available you can clearly see many national boundaries; that is not just the number of real stations, but to a large part national policies on data sharing.



Sharing data is important

We need this data to see what is happening to the climate. We already had almost a degree of global warming and are likely in for at least another one. This will change the sea level, the circulation, precipitation patterns. This will change extreme and severe weather. We will need to adapt to these climatic changes and to know how to protect our communities we need climate data.

Many countries have set up Climate Service Centres or are in the process of doing so to provide their populations with the information they need to adapt. Here companies, (local) governments, non-governmental organisation and citizens can get advice on how to prepare themselves for climate change.

It makes a large difference how often we will see heat waves like the one in [[Europe in 2003]] (70 thousand additional deaths; Robine et al., 2008), in [[Russia in 2010]] (a death toll of 55,000, a crop failure of ~25% and an economic loss of about 1% of the GBP; Barriopedro et al., 2011) or now in India. It makes a large difference how often a [[winter flood like in the UK in 2013-2014]] or [[the flood now in Texas and Oklahoma]] will occur. Once every 10, 100 or 1000 years? If it is 10 years, expensive infrastructural changes will be needed, if it is 1000 years, we will probably decide to life with that. It makes a difference how long droughts like the ones in California or in Chile will last and being able to make regional climate prediction requires high-quality historical climate data.

One of the main outcomes of the current 17th WMO congress will be the adoption of the Global Framework on Climate Services (GFCS). A great initiative to make sure that everyone benefits from climate services, but how will the GFCS framework succeed in helping humanity cope with climate change if there is almost no data to work with?

In their own resolution (8.1) on GFCS, the Congress recognizes this themselves:
Congress noted that EC-66 had adopted a value proposition for the international exchange of climate data and products to support the implementation of the GFCS and recommended a draft resolution on this topic for consideration by Congress.

To understand climate, we need a global overview. National studies are not enough. To understand changes in circulation, interactions with mountains and vegetation, to understand changes in extremes, we need spatially resolved information and not just a few stations.

Homogenization

To reduce the influence of measurement errors and non-climatic changes (inhomogeneities) on our (trend) assessments we need dense networks. These errors are detected and corrected by comparing one station to its neighbours. The closer the neighbours are, the more accurate we can assess the real climatic changes. This is especially important when it comes to changes in severe and extreme weather, where the removal of non-climatic changes is very challenging.

For the global mean land temperature the non-climatic changes already represent 25% of the change: After homogenization (to reduce non-climatic changes) in GHCNv3 the trend is 0.8°C per century since 1880 (Lawrimore et al., 2011; table 4). In the raw data this trend is only 0.6°C per century. That makes a large difference for our assessment how far climate change has progressed, while for large parts of the world we currently do not have enough data to remove such non-climatic changes well. This results in large uncertainties.

This 25% is global, but when in comes to the impacts of climate change, we need reliable local information. Locally the (trend) biases are much larger; on a global scale many biases cancel each ohter. For (decadal) climate prediction we need accurate variability on annual time scales, not "just" secular trends, this is again harder and has larger uncertainties. In the German climate prediction project MiKlip it was shown that a well-homogenized radiosonde dataset was able to distinguish much better between prediction systems and thus to better guide the development. Based on the physics of the non-climatic change we expect that (trend) biases are much stronger for extremes than they are for the mean. For example, errors due to insolation are worst on hot, sunny and calm days, while they are much less a problem on normal cloudy and windy days and thus less of a problem for the average. For the best possible data to protect our communities, we need dense networks, we need all the data there is.

WMO resolution

Theoretically the data exchange resolution will free everything you ever dreamed of. A large number of datasets is mentioned from satellites to sea and lake level, from greenhouse gases to snow cover and river ice. But exactly for the historical climate station that is so important to put climate change into a perspective a limitation is made. Here the international exchange is limited to the [[GCOS]] Stations. The total number of GCOS Stations is 1017 (March 01, 2014). For comparison, Berkeley Earth and the International Surface Temperature Initiative have records with more than 30 thousand stations. And most GCOS stations are likely already included in that. Thus in the end, this resolution will free almost no new climate station data.

The resolution proposes to share “all available data”. But they define that basically as data that is currently already open:
“All available” means that the originators of the data can make them available under this resolution. The term recognizes the rights of Members to choose the manner by, and the extent to, which they make their climate relevant data and products available domestically and for international exchange, taking into consideration relevant international instruments and national policies and legislation.
I have not heard of cases where national weather services denied access to data just for the fun of it. Normally they say it is due to "national policies and legislation". Thus this resolution will not change much.

No idea where these counterproductive national policies come from. For new instruments, for expensive satellites, for the [[Argo system]] to measure the ocean heat content, it is normally specified that the data should be open to all so that society maximally benefits from the investment. In America they naturally see the data as free for all because the tax payer has already paid for it.

In the past there may have been strategic (military) concerns. Climate and weather information can determine wars. However, nowadays weather and climate models are so good that the military benefit of observations is limited. Had Napoleon had a climate model, his troops would have been given warmer cloths before leaving for Russia. To prepare for war you do not need it more accurate than that.

The ministers of finance seems to like the revenues from selling climate data, but I cannot imagine them making much money that way. It is nothing in comparison to the impacts of climate changes or the costs of maladaptation. It will be much less than the money society invested in the climate observations. An investment that is devalued by sitting on the data and not sharing it.

All that while the WMO theoretically recognises how important sharing data is. In another resolution (9.1), ironically on big data, they write:
With increasing acceptance that the climate is changing, Congress noted that Members are again faced with coming to agreement with respect to the international exchange of data of importance of free and unrestricted access to climate-related information at both global and regional levels.
UN and Data Revolution. In August 2014 UN Secretary-General Ban Ki-moon asked an Independent Expert Advisory Group to make concrete recommendations on bringing about a data revolution in sustainable development (). The report indicates that too often existing data remain unused because they are released too late or not at all, not well-documented and harmonized, or not available at the level of detail needed for decision-making. More diverse, integrated, timely and trustworthy information can lead to better decision-making and real-time citizen feedback.
All that while citizen scientists are building up huge meteorological networks in Japan and North America. The citizen scientists are happy to share their data and the weather services should fear that their closed datasets will soon become a laughing stock.

Free our climate data

My apologies when this post sounds angry. I am angry. If that is reason to fire me as chair of the Task Team on Homogenization of the WMO Commission for Climatology, so be it. I cannot keep my mouth shut while this is happing.

Even if this resolution is a step forward and I am grateful for the people who made this happen, it is impossible that in these times the weather services of the world do not do everything they can to protect the communities they work for and freely share all climate data internationally. I really cannot understand how the limited revenues from selling data can seriously be seen as a reason to accept huge societal losses from climate change impacts and maladaptation.

Don't ask me how to solve this deadlock, but WMO Congress it is your job to solve this. You have until Friday next week the 12th of June.

[UPDATE. It might not be visible here because there are only little comments, but this post is read a lot for one without a connection to the mass media. That often happens below science posts that do not say something controversial. (All the scientists I know see data exchange as holding climate science back.) Also the tweet to this post is popular, never had one like this before, please retweet it to show your support for the free exchange of climate data.

]

[UPDATE. Wow, the above tweet has now been seen over 7,000 times (4th June; 19h). Not bad for a small station data blog. Never seen anything like this. Also Sylvie Coyaud blogs at La Repubblica now reports about freeing climate data (in Italian). If there are journalists in Geneva, do ask the delegates about sharing data, especially when they present the Global Framework for Climate Services as the prime outcome of this WMO Congress.]

Related reading

Nature published a column by Martin Bobrow of the Expert Advisory Group on Data Access, which just wrote a report on the governance of scientific data access: Funders must encourage scientists to share.

Why raw temperatures show too little global warming

Just the facts, homogenization adjustments reduce global warming

New article: Benchmarking homogenisation algorithms for monthly data

Statistical homogenisation for dummies

New article: Benchmarking homogenisation algorithms for monthly data

A framework for benchmarking of homogenisation algorithm performance on the global scale - Paper now published

Wednesday, August 27, 2014

A database with parallel climate measurements

By Renate Auchmann and Victor Venema


A parallel measurement with a Wild screen and a Stevenson screen in Basel, Switzerland. Double-Louvre Stevenson screens protect the thermometer well against influences of solar and heat radiation. The half-open Wild screens provide more ventilation, but were found to be affected too much by radiation errors. In Switzerland they were substituted by Stevenson screens in the 1960s.

We are building a database with parallel measurements to study non-climatic changes in the climate record. In a parallel measurement, two or more measurement set-ups are compared to each other at one location. Such data is analyzed to see how much a change from one set-up to another affects the climate record.

This post will first give a short overview of the problem, some first achievements and will then describe our proposal for a database structure. This post's main aim is to get some feedback on this structure.

Parallel measurements

Quite a lot of parallel measurements are performed, see this list for a first selection of datasets we found, however they have often only been analyzed for a change in the mean. This is a pity because parallel measurements are especially important for studies on non-climatic changes in weather extremes and weather variability.

Studies on parallel measurements typically analyze single pairs of measurements, in the best cases a regional network is studied. However, the instruments used are often somewhat different in different networks and the influence of a certain change depends on the local weather and climate. Thus to draw solid conclusions about the influence of a specific change on large-scale (global) trends, we need large datasets with parallel measurements from many locations.

Studies on changes in the mean can be relatively easily compared with each other to get a big picture. But changes in the distribution can be analyzed in many different ways. To be able to compare changes found at different locations, the analysis needs to be performed in the same way. To facilitate this, gathering the parallel data in a large dataset is also beneficial.

Organization

Quite a number of people stand behind this initiative. The International Surface Temperature Initiative and the European Climate Assessment & Dataset have offered to host a copy of the parallel dataset. This ensures the long term storage of the dataset. The World Meteorological Organization (WMO) has requested its members to help build this databank and provide parallel datasets.

However, we do not have any funding. Last July, at the SAMSI meeting on the homogenization of the ISTI benchmark, people felt we can no longer wait for funding and it is really time to get going. Furthermore, Renate Auchmann offered to invest some of her time on the dataset; that doubles the man power. Thus we have decided to simply start and see how far we can get this way.

The first activity was a one-page information leaflet with some background information on the dataset, which we will send to people when requesting data. The second activity is this blog post: a proposal for the structure of the dataset.

Upcoming tasks are the documentation of the directory and file formats, so that everyone can work with it. The data processing from level to level needs to be coded. The largest task is probably the handling of the metadata (data about the data). We will have to complete a specification for the metadata needed. A webform where people can enter this information would be great. (Does anyone have ideas for a good tool for such a webform?) And finally the dataset will have to be filled and analyzed.

Design considerations

Given the limited manpower, we would like to keep it as simple as possible at this stage. Thus data will be stored in text files and the hierarchical database will simply use a directory tree. Later on, a real database may be useful, especially to make it easier to select the parallel measurements one is interested in.

Next to the parallel measurements, also related measurements should be stored. For example, to understand the differences between two temperature measurements, additional measurements (co-variates) on, for example, insolation, wind or cloud cover are important. Also metadata needs to be stored and should be machine readable as much as possible. Without meta-information on how the parallel measurement was performed, the data is not useful.

We are interested in parallel data from any source, variable and temporal resolution. High resolution (sub-daily) data is very important for understanding the reasons for any differences. There is probably more data, especially historical data, available for coarser resolutions and this data is important for studying non-climatic changes in the means.

However, we will scientifically focus on changes in the distribution of daily temperature and precipitation data in the climate record. Thus, we will compute daily averages from sub-daily data and will use these to compute the indices of the Expert Team on Climate Change Detection and Indices (ETCCDI), which are often used in studies on changes in “extreme” weather. Actively searching for data, we will prioritize instruments that were much used to perform climate measurements and early historical measurements, which are more rare and are expected to show larger changes.

Following the principles of the ISTI, we aim to be an open dataset with good provenance, that is, it should be possible to tell were the data comes from. For this reason, the dataset will have levels with increasing degrees of processing, so that one can go back to a more primitive level if one finds something interesting/suspicious.

For this same reason, the processing software will also be made available and we will try to use open software (especially the free programming language R, which is widely used in statistical climatology) as much as possible.

It will be an open dataset in the end, but as an incentive to contribute to the dataset, initially only contributors will be able to access the data. After joint publications, the dataset will be opened for academic research as a common resource for the climate sciences. In any case people using the data of a small number of sources are requested to explicitly cite them, so that contributing to the dataset also makes the value of making parallel measurements visible.

Database structure

The basic structure has 5 levels.

0: Original, raw data (e.g. images)
1: Native format data (as received)
2: Data in a standard format at original resolution
3: Daily data
4: ETCCDI indices

In levels 2, 3 & 4 we will provide information on outliers and inhomogeneities.

Especially for the study of extremes, the removal of outliers is important. Suggestions for good software that would work for all climate regions is welcome.

Longer parallel measurements may, furthermore, also contain inhomogeneities. We will not homogenize the data, because we want to study the raw data, but we will detect breaks and provide their date and size as metadata, so that the user can work on homogeneous subperiods if interested. This detection will probably be performed at monthly or annual scales with one of the HOME recommended methods.

Because parallel measurements will tend to be well correlated, it is possible that statistically significant inhomogeneities are very small and climatologically irrelevant. Thus we will also provide information on the size of the inhomogeneity so that the user can decide whether such a break is problematic for this specific application or whether having longer time series is more important.

Level 0 - images

If possible, we will also store the images of the raw data records. This enables the user to see if an outlier may be caused by unclear handwriting or whether the observer explicitly wrote that the weather was severe that day.

In case the normal measurements are already digitized, only the parallel one needs to be transcribed. In this case the number of values will be limited and we may be able to do so. Both Bern and Bonn have facilities to digitize climate data.

Level 1 – native format

Even if it will be more work for us, we would like to receive the data in its native format and will convert it ourselves to a common standard format. This will allow the users to see if mistakes were made in the conversion and allows for their correction.

Level 2 – standard format

In the beginning our standard format will be an ASCII format. Later on we may also use a scientific data format such as NetCDF. The format will be similar to the one of the COST Action HOME. Some changes will be needed to the filenames account for multiple measurements of the same variable at one station and for multiple indices computed from the same variable.

Level 3 - daily data

We expect that an important use of the dataset will be the study of non-climatic changes in daily data. At this level we will thus gather the daily datasets and convert the sub-daily datasets to daily.

Level 4 – ETCCDI indices

Many people use the indices to the ETCCDI to study changes in extreme weather. Thus we will precompute these indices. Also in case government policies do not allow giving out the daily data, it may sometimes be possible to obtain the indices. The same strategy is also used by the ETCCDI in regions where data availability is scarce and/or data accessibility is difficult.

Directory structure

In the main directory there are the sub-directories: data, documentation, software and articles.

In the sub-directory data there are sub-directories for the data sources with names d###; with d for data source and ### is a running number of arbitrary length.

In these directories there are up to 5 sub-directories with the levels and one directory with “additional” metadata such as photos and maps that cannot be copied in every level.

In the level 0 and level 1 directories, climate data, the flag files and the machine readable metadata are directly in this directory.

Because one data source can contain more than one station, in the levels 2 and higher there are sub-directories for the various stations. These sub-directories will be called s###; with s for station.

Once we have more data and until we have a real database, we may also provide a directory structure first ordered by the 5 levels.

The filenames will contain information on the station and variable. In the root directory we will provide machine readable tables detailing which variables can be found in which directories. So that people interested in a certain variable know which directories to read.

For the metadata we are currently considering using XML, which can be read into R. (Are the similar packages for Matlab and FORTRAN?) Suggestions for other options are welcome.

What do you think? Is this a workable structure for such a dataset? Suggestions welcome in the comments or also by mail (Victor Venema & Renate Auchmann ).

Related reading

A database with daily climate data for more reliable studies of changes in extreme weather
The previous post provides more background on this project.
CHARMe: Sharing knowledge about climate data
An EU project to improve the meta information and therewith make climate data more easily usable.
List of Parallel climate measurements
Our Wiki page listing a large number of resources with parallel data.
Future research in homogenisation of climate data – EMS 2012 in Poland
A discussion on homogenisation at a Side Meeting at EMS2012
What is a change in extreme weather?
Two possible definitions, one for impact studies, one for understanding.
HUME: Homogenisation, Uncertainty Measures and Extreme weather
Proposal for future research in homogenisation of climate network data.
Homogenization of monthly and annual data from surface stations
A short description of the causes of inhomogeneities in climate data (non-climatic variability) and how to remove it using the relative homogenization approach.
New article: Benchmarking homogenization algorithms for monthly data
Raw climate records contain changes due to non-climatic factors, such as relocations of stations or changes in instrumentation. This post introduces an article that tested how well such non-climatic factors can be removed.

Thursday, October 4, 2012

Beta version of a new global temperature database released

Today, a first version of the global temperature dataset of the International Surface Temperature Initiative (ISTI) with 39 thousand stations has been released. The aim of the initiative is to provide an open and transparent temperature dataset for climate research.

The database is designed as a climate "sceptic" wet dream: the entire processing of the data will be performed with automatic open software. This includes every processing step from conversion to standard units, to merging stations to longer series, to quality control, homogenisation, gridding and computation of regional and global means. There will thus be no opportunity for evil climate scientists to fudge the data and create an artificially strong temperature trend.

It is planned that in many cases, you can go back to the digital images of the books or cards on which the observer noted down the temperature measurements. This will not be possible for all data. Many records have been keyed directly in the past, without making digital images. Sometimes the original data is lost, for instance in case of Austria, where the original daily observation have been lost in the Second World War and only the monthly means are still available from annual reports.

The ISTS also has a group devoted to data rescue to encourage people to go into the archives, image and key in the observations and upload this information to the database.


Tuesday, September 18, 2012

Future research in homogenisation of climate data – EMS2012 in Poland

By Enric Aguilar and Victor Venema

The future of research and training in homogenisation of climate data was discussed at the European Meteorological Society in Lodz by 21 experts. Homogenisation of monthly temperature data has improved much in the last years, as seen in the results of the COST-HOME project. On the other hand the homogenization of daily and subdaily data is still in its infancy and this data is used frequently to analyse changes in extreme weather. It is expected that inhomogeneities in the tails of the distribution are stronger than in the means. To make such analyses on extremes more reliable, more work on daily homogenisation is urgently needed. This does not mean than homogenisation at the monthly scale is already optimal, much can still be improved.

Parallel measurements

Parallel measurements with multiple measurement set-ups were seen as an important way to study the nature of inhomogeneities in daily and sub-daily data. It would be good to have a large international database with such measurements. The regional climate centres (RCC) could host such a dataset. Numerous groups are working on this topic, but more collaboration is needed. Also more experiments would be valuable.

When gathering parallel measurements the metadata is very important. INSPIRE (an EU Directive) has a standard format for metadata, which could be used.

It may be difficult to produce an open database with parallel measurements as European national meteorological and hydrological services are often forced to sell their data for profit.(Ironically, in the Land the Free (markets), climate data is available freely, the public already paid for it with their tax money after all.) Political pressure to free climate data is needed. Finland is setting a good example and will free its data in 2013.

Thursday, August 2, 2012

Do you want to help with data discovery?

Reposted from the blog of the International Surface Temperature Initiative

As was alluded to in an earlier posting here, NOAA's National Climatic Data Center has recently endeavored on an effort to discover and rescue a plethora of international holdings in hard copy in its basement and make them usable by the international science community. The resulting images of the records from the first chunk of these efforts have just been made available online. Sadly, it is not realistic at the present time to key these data so they remain stuck in a half-way house, available, tantalizingly so, but not yet truly usable.

So, if you want to undertake some climate sleuthing now is your moment to shine ...! The data have all been placed at ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/daily/stage0/FDL/ . These consist of images at both daily and monthly resolution - don't be fooled by the daily in the ftp site address. If you find a monthly resolution data source you could digitize years worth of records in an evening.

Whether you wish to start with Angola ...