Saturday, 25 December 2021

New German Sovereign Tech Fund will fund open source digital infrastructure to avert the next log4j

XKCD cartoon of an intricate tower made of blocks, all resting on a tiny block near the bottom, whose removal would topple the building. The top is called All modern digital infrastrucutre. The tiny block is marked as A project some random person in Nebraska has been thanklessly maintaining since 2003

The famous XKCD cartoon has resulted in an open source digital infrastructure fund. Thank you Randall.

Late in the afternoon, just before a national holiday, is not the best time to get attention. Which is probably the main reason that the press did not (yet) write about what Franziska Brantner (the new Green deputy minister for the economy) wrote on Twitter:

We will tackle the Sovereign Tech Fund! Log4j has shown that sustainably secured and reliable open source solutions are the basis for the innovative strength and digital sovereignty of the German economy. We will therefore promote open source enabling technologies from 2022 onwards.

[[Log4j]] is a security vulnerability in a 21-year old Java library that is used a lot, which is easy to exploit and existed for almost a decade before being noticed. As a Free and Open Source Software (FOSS) it was used widely and produces a lot of value, despite there not being much funding for producing FOSS. In this way much of the digital economy depends on the dedication of unpayed hobbyists, as XKCD Explained explains well.

The German Sovereign Tech Fund will step into this gap. We will have to see how the government will implement it, but the name comes from a feasibility study by the Open Knowledge Foundation, which proposed a fund to support "the development, scaling and maintenance of digital and foundational technologies. The goal of the fund could be to sustainably strengthen the open source ecosystem, with a focus on security, resilience, technological diversity, and the people behind the code."

Such a fund had not explicitly made it into the coalition agreement of the new government to the lament of the FOSS community. Although it does fit to the spirit of the agreement. 

Deputy minister Franziska Brantner carbon copied Patrick Beuth, a journalist who recently wrote about log4j in the magazine Der Spiegel and mentioned the Sovereign Tech Fund as a solution. So log4j seems to have been the clincher.

This announcement adds to a period of hope for digital rights. Most of my life they have become worse, more privacy for the powerful, more vulnerability for us. Things which were protected in the analogue world (taking to each other, sending a letter) have been criminalized and subjected to surveillance. The fast creation of abusive monopolies is the official business model in Silicon valley. Social media monopolies sprouted who do not care how much damage they do to society and our democracy, while Europe was increasingly becoming a digital colony. 

However, lately with the EU privacy law, the rise of the Fediverse, the upcoming EU Digital Services Act and a good coalition agreement in Germany, it is starting to look like it is actually possible for digital right to improve.

This proposal is for a fund of 10 million Euro per year, which is a good start. Especially when similar EU proposals also manage to get funded. There is also project funding for new software tools: the Prototype Fund in Germany or the Next Generation Internet (NGI) and NGI-zero initiative in Europe. 

What I feel is still missing are stable public institutions where coders can jointly work on large tasks, such as maintaining Firefox or extending what is possible in the Fediverse. If we would compare the situation in software to science, we now have funding for projects by the National Science Foundation and agencies, but there are no equivalents yet of the National Institute of Health, research institutes or universities.

More in general we need a real solution to invest in goods and services with enormous societal and economic value that do not have much market value (research and development, security, (preventative) healthcare, weather services, justice, software, (digital) infrastructure, governance, media, ...). We are no longer in the 19th century. These kinds of cases are an increasing large part of the future economy.

Related reading

Patrick Beuth (Der Spiegel): Wie löscht man ein brennendes Internet?

XKCD Explained on the XKCD on software dependencies.

The digitization section of the coalition agreement in English.

Monday the 27th of December there is a session on the Sovereign Tech Fund at the remote Chaos Computer Congress.

Digital Services Act: Greens/EFA successes

Micro-blogging for scientists without nasties and surveillance

Thursday, 6 May 2021

We launched a new group to promote the translation of the scientific literature

Tell your story, tell your journey, they say. Climate Outreach advised: tell about how you came to accept climate change is a problem. Maybe I am too young, but still not being 50 I have accepted climate change was a risk we should care about already as a kid.

Also otherwise, I do not remember suddenly changing my mind often, so that I could talk about my journey. Where the word "remember" may do a lot of the work. Is it useful not to remember such things to make it easier on you to change your mind? Or do many people work with really narrow uncertainty intervals even when they do not have a clue yet?

But when it comes to translations of scientific articles, I changed a lot. When I was doing cloud research I used to think that knowing English was just one of the skills a scientist needs. Just like logic, statistics, coding, knowing the literature, public speaking, and so on.

Working on historical climate data changed this. I regularly have to communicate with people from weather services from all over the world and many do not speak English (well), while they do work that is crucial for science. Given how hard we make it for them to participate they do an amazing job; I guess the World Meteorological Organization translating all their reports in many languages helps.

The most "journey" moment was at the Data Management Workshop in Peru, where I was the only one not speaking Spanish. A colleague told me that she translated important scientific articles into Spanish and send them by email to her colleagues. Just like Albert Einstein translated scientific articles into English for those who did not master the language of science at the time.

This got me thinking about a database where such translations could be made available. When you search for an article and can see which translations are available. Or where you can search for translated articles on a specific topic. Such a resource would make producing translations more worthwhile and would thus hopefully stimulate their production.

Gathering literature, bookmarks on this topic and noticing who else was interested in this topic, I have invited a group of people to see if we can collaborate on this topic. After a series of pandemic video calls, we decided to launch as a group, somewhat unimaginatively called: "Translate Science". Please find below the part of our launch blog post about why translations are important.

(To be fair to me, and I like being fair to me, for a fundamental science needing expensive instruments such as cloud studies it makes more sense to simply do it in English. While for sciences that directly impact people, climate, health, agriculture, two-way communication within science, with the orbit around science and with society is much more important.

But even in the clouds sciences I should probably have paid more attention to studies in other languages. One of our group members works on turbulence and droplets and found many worthwhile papers in Russian. I had never considered that and might have found some turbulent gems there as well.)

The importance of translated articles

English as a common language has made global communication within science easier. However, this has made communication with non-English communities harder. For English-speakers it is easy to overestimate how many people speak English because we mostly deal with foreigners who do speak English. It is thought that that about one billion people speak English. That means that seven billion people do not. For example, at many weather services in the Global South only few people master English, but they use the translated guidance reports of the World Meteorological Organization (WMO) a lot. For the WMO, as a membership organization of the weather services, where every weather service has one vote, translating all its guidance reports into many languages is a priority.

Non-English or multilingual speakers, in both African (and non-African) continents, could participate in science on an equal footing by having a reliable system where scientific work written in non-English language is accepted and translated into English (or any other language) and vice versa. Language barriers should not waste scientific talent.

Translated scientific articles open science to regular people, science enthusiasts, activists, advisors, trainers, consultants, architects, doctors, journalists, planners, administrators, technicians and scientists. Such a lower barrier to participating in science is especially important on topics such as climate change, environment, agriculture and health. The easier knowledge transfer goes both ways: people benefiting from scientific knowledge and people having knowledge scientists should know. Translations thus help both science and society. They aid innovation and tackling the big global challenges in the fields of climate change, agriculture and health.

Translated scientific articles speed up scientific progress by tapping into more knowledge and avoiding double work. They thus improve the quality and efficiency of science. Translations can improve public disclosure, scientific engagement and science literacy. The production of translated scientific articles also creates a training dataset to improve automatic translations, which for most languages is still lacking.

The full post at the Translate Science blog explains more about who we are, what we would like to do to promote translations and how you can join.

Thursday, 22 April 2021

The confusing politics behind John Stossel asking Are We Doomed?

As member of Climate Feedback I just reviewed a YouTube video by John Stossel. In that review I could only respond to factual claims, which were the boring age-old denier evergreens. Thus not surprisingly the video got a solid "very low" scientific credibility. But it got over 25 million views, so I guess responding was worth it.

The politics of the video were much more "interesting". As in: "May you live in interesting times". Other options would have been: crazy, confusing, weird.

That starts with the title of the video: "Are We Doomed?". Is John Stossel suggesting that damages are irrelevant if they are not world ending? I would be surprised if that were his general threshold for action. "Shall we build a road?". Well, "Are We Doomed?" "Should we fund the police? Well, "Are We Doomed?" "Shall I eat an American taco?" Well, "Are We Doomed?"

Are we not to invest in a more prosperous future unless we are otherwise doomed? That does not seem to be the normal criterion for rational investments any sane person or corporation would use.

Then there is his stuff about sea level rise:

"Are you telling me that people in Miami are so dumb that they are just going to sit there and drown?”

That remind me of a similar dumb statement by public intellectual Ben Shapiro (I hope people hear the sarcasm, in the US you can never be sure) and the wonderful response to it by H Bomber Guy:

Bomber also concludes that this, this, ... whatever it is, has nothing to do with science:

"How have things reached a point, where someone thinks they can get away with saying something this ridiculous in front of an audience of people? And how have things reached the point where some people in that audience won't recognize it for the obvious ignorant bullshit that it is?
This led me down a particular hole of discovery. I realized that climate deniers aren't just wrong, they're obviously wrong. In very clear ways, and that makes the whole thing so much more interesting. How does this work if it's so paper thin?"

Politically interesting is that Stossel wants Floridians to get lost and Dutch people to pay an enormous price, in this video, while the next Stossel video Facebook suggests has the tagline: "Get off my property". And Wikipedia claims that Stossel is a "Libertarian pundit".

So do we have to accept any damages Stossel wants to us to suffer under? Do we have to leave our house behind? Does Stossel get to destroy our community and our family networks? Is Stossel selling authoritarianism where he gets to decide who suffers? Or is Stossel selling markets with free voluntary transaction and property rights?

In America, lacking a diversity of parties, both ideologies are within the same (Republican) party, but these are two fundamentally different ideas. But either you are a Conservative and believe in property rights or you are an Authoritarian and think you can destroy other people's property when you have the power.

You can reconcile these two ideas with the third ideological current in the Republican party: childish Libertarianism, where you get to pretend that the actions of person X never affect person Y. An ideology for teenagers and a lived reality for the donor class that funds US politics and media, who never suffer consequences for their terrible behavior.

But in this video Stossel rejects this childish idea and accepts that Florida suffers damages:

"Are you telling me that people in Miami are so dumb that they are just going to sit there and drown?”

So, John Stossel, do you believe in property rights or don't you?

Friday, 16 April 2021

Antigen rapid tests much less effective for screening than previously thought according to top German virologist Drosten

Hidden in a long German language podcast on the pandemic Prof. Dr. Christian Drosten talked about an observation that has serious policy implications.

At the moment this is not yet based on any peer reviewed studies, but mostly on his observations and those of his colleagues running large diagnosis labs. So it is important to note that he is a top diagnostic virologist from German who specialized on emerging and Corona viruses and made the first SARS-CoV-2 PRC test.

In the Anglo-American news Drosten is often introduced as the German Fauci. This fits as being one of the most trusted national sources of information. But Drosten has much more expertise, both Corona virusses and diagnostic testing are his beat.

Tim Lohn wrote an article about this in Bloomberg: "Rapid Covid Tests Are Missing Early Infections, Virologist Says." And found two experts making similar claims.

Let me give a longer and more technical explanation than Tim Lohn of what Prof. Dr. Christian Drosten claims. Especially because there is no peer reviewed study yet, I feel the explanation is important.

If you have COVID symptoms (day 0), sleep on it and test the next day the antigen tests are very reliable. But on day zero itself and especially on the one or two days before where you were already infectious they are not as reliable. So they are good for (self-)diagnosis, but less good for screening, for catching those first days of infectiousness. The PCR tests are sensitive enough for those pre-symptomatic cases, if only people would test with PCR that early and would immediately get the result.

Figure from Jitka Polechová et al.

In those pre-symptomatic days there is already a high viral load, but this is mostly active virus. The antigen test detects the presence of the capsid of the virus, the protective shell of the virus. The PCR test detects virus RNA. When infecting a cell, the capsid proteines are produced first, before the RNA is produced. So in that respect one might expect the rapid tests to be able to find virus a few hours earlier.

But here we are talking about a few days. The antigen test can best detected capsids in a probe sample when epithelial cells die and mix with the mucus, which takes a few days. So the difference between the days before and after symptoms is the amount of dead virus material, which the rapid tests can detect to get reliable results. That is the reason why in the time after symptom onset the antigen tests predict infectiousness well. But in those early days possibly not.

This was not detected before because the probes used to study how well the tests work were mostly from symptomatic people; it is hard to get get positive probes from people who are infectious before they are symptomatic. Because you do not often have pre-symptomatic cases with both a PCR and an anti-gen tests, also the observations of Drosten are based on just a few cases. He strongly encouraged systematic studies to be made and published, but this will take a few months.

In the Bloomberg article Tim Lohn quotes Rebecca Smith who found something similar:

In a paper published in March -- not yet peer reviewed -- researchers led by Rebecca L. Smith at the University of Illinois at Urbana-Champaign found that, among other things, PCR tests were indeed better at detecting infections early on than a Quidel rapid antigen test. But the difference narrowed after a few days, along with when the different tests were repeatedly used on people.

The article also quotes Jitka Polechová of the University of Vienna, who wrote a review comparing PCR tests to antigen tests:

“Given that PCR tests results are usually not returned within a day, both testing methods are similarly effective in preventing spread if used correctly and frequently.”

This is a valid argument for comparing the tests when are used for diagnostics or as additional precautions for dangerous activities that have to take place.

However, at least in Germany, rapid tests are also used as part of opening up the economy. Here people can, for example, go into the theatre or a restaurant after having been tested. This is something one would not use a PCR for, because it would not be fast enough. These people at theatres and restaurants may think they are nearly 100% safe, but actually 3 of the on average 8 infectious days would not be detected. If, in addition, people behave more dangerously, thinking they are safe, opening a restaurant this way may not be much less dangerous than opening a restaurant without any testing.

So we have to rethink this way of opening up activities inside and rather try to meet people outside.

Related reading

Original source: Das Coronavirus-Update von NDR Info, edition 84: "(84) Nicht auf Tests und Impfungen verlassen". Time stamp: "00:48:09 Diagnostik-Lücke bei Schnelltests"

Northern German public media (NDR) article: 'Drosten: "Schnelltests sind wohl weniger zuverlässig als gedacht."' Translated: Drosten: "Rapid tests are probably less reliable than expected"

Tim Lohn in Bloomberg: "Rapid Covid Tests Are Missing Early Infections, Virologist Says."

Jitka Polechová, Kory D. Johnson, Pavel Payne,Alex Crozier, Mathias Beiglböck, Pavel Plevka, Eva Schernhammer. Rapid antigen tests: their sensitivity, benefits forepidemic control,and use in Austrian schools. Not reviewed preprint.

Friday, 22 January 2021

New paper: Spanish and German climatologists on how to remove errors from observed climate trends

This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri (French) screen, in use in Spain and many European countries in the late 19th century and early 20th century. Leftmost, Stevenson screen equipped with conventional meteorological instruments, a set-up used globally for most of the 20th century. In the middle, Stevenson screen equipped with automatic sensors. The Montsouri screen is better ventilated, but because some solar radiation can get onto the thermometer it registers somewhat higher temperatures than a Stevenson screen. Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.

The instrumental climate record is human cultural heritage, the product of the diligent work of many generations of people all over the world. But changes in the way temperature was measured and in the surrounding of weather stations can produce spurious trends. An international team, with participation of the University Rovira i Virgili (Spain), State Meteorological Agency (AEMET, Spain) and University of Bonn (Germany), has made a great endeavour to provide reliable tests for the methods used to computationally eliminate such spurious trends. These so-called “homogenization methods“ are a key step to turn the enormous effort of the observers into accurate climate change data products. The results have been published in the prestigious Journal of Climate of the American Meteorological Society. The research was funded by the Spanish Ministry of Economy and Competitiveness.

Climate observations often go back more than a century, to times before we had electricity or cars. Such long time spans make it virtually impossible to keep the measurement conditions the same across time. The best-known problem is the growth of cities around urban weather stations. Cities tend to be warmer, for example due to reduced evaporation by plants or because high buildings block cooling. This can be seen comparing urban stations with surrounding rural stations. It is less talked about, but there are similar problems due to the spread of irrigation.

The most common reason for jumps in the observed data are relocations of weather stations. Volunteer observers tend to make observations near their homes; when they retire and a new volunteer takes over the tasks, this can produce temperature jumps. Even for professional observations keeping the locations the same over centuries can be a challenge either due to urban growth effects making sites unsuitable or organizational changes leading to new premises. Climatologist from Bonn, Dr. Victor Venema, one of the authors: “a quite typical organizational change is that weather offices that used to be in cities were transferred to newly build airports needing observations and predictions. The weather station in Bonn used to be on a field in village Poppelsdorf, which is now a quarter of Bonn and after several relocations the station is currently at the airport Cologne-Bonn.

For global trends, the most important changes are technological changes of the same kinds and with similar effects all over the world. Now we are, for instance, in a period with widespread automation of the observational networks.

Appropriate computer programs for the automatic homogenization of climatic time series are the result of several years of development work. They work by comparing nearby stations with each other and looking for changes that only happen in one of them, as opposed to climatic changes that influence all stations.

To scrutinize these homogenization methods the research team created a dataset that closely mimics observed climate datasets including the mentioned spurious changes. In this way, the spurious changes are known and one can study how well they are removed by homogenization. Compared to previous studies, the testing datasets showed much more diversity; real station networks also show a lot of diversity due to differences in their management. The researchers especially took care to produce networks with widely varying station densities; in a dense network it is easier to see a small spurious change in a station. The test dataset was larger than ever containing 1900 station networks, which allowed the scientists to accurately determine the differences between the top automatic homogenization methods that have been developed by research groups from Europe and the Americas. Because of the large size of the testing dataset, only automatic homogenization methods could be tested.

The international author group found that it is much more difficult to improve the network-mean average climate signals than to improve the accuracy of station time series.

The Spanish homogenization methods excelled. The method developed at the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain, by Hungarian climatologist Dr. Peter Domonkos was found to be the best at homogenizing both individual station series and regional network mean series. The method of the State Meteorological Agency (AEMET), Unit of Islas Baleares, Palma, Spain, developed by Dr. José A. Guijarro was a close second.

When it comes to removing systematic trend errors from many networks, and especially of networks where alike spurious changes happen in many stations at similar dates, the homogenization method of the American National Oceanic and Atmospheric Agency (NOAA) performed best. This is a method that was designed to homogenize station datasets at the global scale where the main concern is the reliable estimation of global trends.

The earlier used Open Screen used at station Uccle in Belgium, with two modern closed thermometer Stevenson screens with a double-louvred walls in the background.

Quotes from participating researchers

Dr. Peter Domonkos, who earlier was a weather observer and now writes a book about time series homogenization: “This study has shown the value of large testing datasets and demonstrates another reason why automatic homogenization methods are important: they can be tested much better, which aids their development.

Prof. Dr. Manola Brunet, who is the director of the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain, Visiting Fellow at the Climatic Research Unit, University of East Anglia, Norwich, UK and Vice-President of the World Meteorological Services Technical Commission said: “The study showed how important dense station networks are to make homogenization methods powerful and thus to compute accurate observed trends. Unfortunately, still a lot of climate data needs to be digitized to contribute to an even better homogenization and quality control.

Dr. Javier Sigró from the Centre for Climate Change, Univ. Rovira i Virgili, Vila-seca, Spain: “Homogenization is often a first step that allows us to go into the archives and find out what happened to the observations that produced the spurious jumps. Better homogenization methods mean that we can do this in a much more targeted way.

Dr. José A. Guijarro: “Not only the results of the project may help users to choose the method most suited to their needs; it also helped developers to improve their software showing their strengths and weaknesses, and will allow further improvements in the future.

Dr. Victor Venema: “In a previous similar study we found that homogenization methods that were designed to handle difficult cases where a station has multiple spurious jumps were clearly better. Interestingly, this study did not find this. It may be that it is more a matter of methods being carefully fine-tuned and tested.

Dr. Peter Domonkos: “The accuracy of homogenization methods will likely improve further, however, we never should forget that the spatially dense and high quality climate observations is the most important pillar of our knowledge about climate change and climate variability.

Press releases

Spanish weather service, AEMET: Un equipo internacional de climatólogos estudia cómo minimizar errores en las tendencias climáticas observadas

URV university in Tarragona, Catalonian: Un equip internacional de climatòlegs estudia com es poden minimitzar errades en les tendències climàtiques observades

URV university, Spanish: Un equipo internacional de climatólogos estudia cómo se pueden minimizar errores en las tendencias climáticas observadas

URV university, English: An international team of climatologists is studying how to minimise errors in observed climate trends


Tarragona 21: Climatòlegs de la URV estudien com es poden minimitzar errades en les tendències climàtiques observades

Genius Science, French: Une équipe de climatologues étudie comment minimiser les erreurs dans la tendance climatique observée A team of climatologists is studying how to minimize errors in observed climate trend


Friday, 20 November 2020

Yes, it makes sense not to have diner parties while the schools are still open. Think of it as a Corona contact budget.


Can the kids go to school in restaurants

Jessica Winter, editor New Yorker


Analogies can be enlightening. Bad faith actors will always find something to nit pick, but for those interested in understanding analogies can help to open a toolbox of existing ideas and argumentative structures.

I wondered whether it may be useful to talk about Corona contacts as a budget.

It would avoid arguments like "if churches can be open, why can't we have concerts under similar conditions". "If you cannot meet indoors with more than 15 people, then why are schools open? Math!" 

One would never argue "if we just bought his flat, why can't we buy a summer house?" Maybe you have the budget to buy a summer house, but buying a flat does not mean you can also afford the summer house.

Similarly in the political realm: "if we can have social security, why can't we have a basic income (social security for all)?" For me a basic income is freedom, fulfilment of human potential and prosperity, but you will have to find the money. "If we can spend 10% of our GDP on healthcare as an average OECD country, why can't we spend 20%?" You can, and America does, but it will still be hard to find the funding for the additional 10% if countries with universal health care wanted to destroy their system and adopt the American partial system.

When it comes to budgets it is immediately clear that you have to set priorities and invest wisely.

The reproduction number of the SARS-CoV-2 virus is between two and three. Let's assume for this article that it is two to get easier numbers. This means that one infected person on average infects two other people. If we reduce number of infectious contacts by more than half the virus would decline.

The "on average" does a lot of work. How many people one person infects varies widely. As a rule of the thumb for SARS2: Four of five infected people only infect one other person or none, while one in five infects many people. It is only two on average.

And you have to average over a population that is in contact with each other. When in France no one has any contacts, while in Germany life continues as normal, the virus will spread like wildfire in Germany. But if inside the city of Bonn half of the people disappear, the remaining people have less contacts than before. The remaining half should not especially seek each out for the analogy to hold.

How does this analogy help? If we look at the budget of a country like Germany, it makes clear that we should look for reductions where we spend a lot. Work, school, free time. I am as annoyed by the anti-Corona protests as many complaining about them, but compared to 80 million inhabitants that see each other at work and school (indoors) every day, these protests, even if they were really big, are a completely insignificant number of contacts. And the right to protest is a foundation of our societies and should thus have a high priority. I think it is fine to mandate masks at protests and if you do so you should uphold the rule of law.

Less than 20% of Germany is younger than 20. So we could afford to spend our contacts there and ask the other 80% to do more. People often argue that children not going to school is disruptive for the economy. I would also argue a pandemic last one year is a large part of their lives, while additionally young people mostly do this to protect others. There is naturally no need to squander our budget, we could  require older kids to wear masks to reduce the effective number of contacts, install air filters or far UC-V lights in class rooms or reduce the number of days children go to school.

Some feel we should close the schools to protect teachers, but the main reason to care about avoiding contacts is, even now, not about the people being infected today, but about the spreading the virus and all the people who will die because of that. 

If we life above our contact budget most of the dying happens after several links in the chain of infection and no longer close to the school: The teacher or student infects 2 others, they infect 4 other, 8, 16, 32, 64, 128, ... Those 128 will reside all over the city/county, if not state and have many different professions. If we would life within our Corona budget and the level of infection would be and stay low, the entire community, including teachers, would be safe.

The exponential growth of a virus also nicely fits to the exponential growth of money in your [[savings account]]. I added a link for young people. A savings account used to be a place where you would keep you money and the bank would give you a percentage of the amount as a thank you, which they called "interest". People who are into money and budgets likely still remember this and how it was normal to "invest" money to have more money later. 

When the press talks about exponential growth, I tend to worry they simply mean fast growth. Economic growth is much slower than the pandemic, but when it comes to money people get glowing eyes and talk enthusiastically about compound interest and putting something aside for later.

Similarly when a society invests in less contacts, we can have more freedom later.  Even more so because once the number of infections is low enough track and trace becomes much more efficiently and you get double returns on investment. Like an investment banker who has to pay less taxes because ... reasons.

At least the financial press should know the famous example of exponential growth: the craftsman who "only" asks the king for rice as payment for his chessboard: one grain on the first square, two on the second, four on the third square and so on. 


What is true for infections is also true for hospital beds and ICU beds. Once half of your patients are COVID-19 patient, it is only a matter of one more doubling time and the capacity is filled. Exponential growth is not just fast, it overwhelms linear systems like hospitals where you cannot keep on doubling the number of beds. 
If we let it get this far we are forcing doctors to choose who lives. Who is in the ICU too long and would likely stay there a long time while this capacity could be used for multiple new patients. Who is removed for the ICU to die. A healthy society does not put doctors in such a position.

With good care around 1 percent of people die in the West (in young societies in Africa less). Supporters of the virus tend to use this number or even much lower fantasy numbers. However, if we let it get out of control like this, ignore the exponential growth and the delay between infections and deaths, the hospital care would collapse and a few percent would die.

Many more people need to got the hospital. In Germany this is 17%. A recent French study reported that after 110 day most patients are still tired and have trouble breathing, many did not yet work again.
At the latest when the hospitals collapse people will reduce contacts, even if not mandated. It is much smarter make an investment earlier, to reduce our number of infectious contacts earlier. 
A well-know American president said it is smart to go bankrupt. It is smarter to make money.
Investing early pays of even more because then more subtle measures are still possible, while in an emergency a much more invasive lockdown will be necessary and, for those that only care about money, more damage to the economy will be done.

(As many of my readers are interested in climate change, let me add that I find it weird that when it comes to protecting the climate people often talk about it as a cost and not as an investment that will pay good dividends in the future, just like any other investment. If you mind that our kids will thus have it better than we have it, you can finance the investments with loans, like any business would.)

Related reading


Monday, 9 November 2020

Science Feedback on Steroids

Climate Feedback is a group of climate scientists reviewing press articles on climate change. By networking this valuable work with science-interested citizens we could put this initiative on steroids.

Disclosure, I am member of Climate Feedback.

How Climate Feedback works

Climate Feedback works as follows. A science journalist monitors which stories on climate change are shared much on social media and invites publishing climate scientists with relevant expertise to review the factual claims being made. The scientists make detailed reviews on concrete claims, ideally using web annotations (see example below), sometimes by email.



They also write a short summary of the article and grade its scientific credibility. These comments, summaries and grades are then summarized in a graphic and an article written by the science journalist. 

Climate Feedback takes care of spreading the reviews to the public and to the publication that was reviewed. Climate Feedback is also part of a network of fact checking organizations giving them more credibility and they add metadata to the review pages that social media and search engines can show their users.



For scientists this is a very efficient fact checking operation. The participants only have to respond to the claims they have expertise on. If there are many claims outside my expertise I can wait until my colleagues added their web annotations before I write my summary and determine my grade. Especially compared to writing a blog post Climate Feedback is very effective.

The initiative recently branched out to reviewing health claims with a new Health Feedback group. The umbrella is now called Science Feedback.

The impact

But there is only so much a group of scientists can do and by the time the reviews are in and summarized the article is mostly old news. Only a small fraction of readers would see any notifications social media systems could put on posts spreading them.

This is still important information for people who closely follow the topic, helps them to see how such reviews are done, assess which publication are reliable and helps to see which groups are credible. 

The reviews may be most important for the journalists and the publications involved. Journalists doing high quality work can now demonstrate this to editors who will mostly not be able to assess this themselves. Some journalists have even asked for reviews of important pieces to showcase the quality of their work. Reversely editors can seek out good journalists and cut ties with journalists regularly hurt their reputation. The latter naturally only helps publications that care about quality.

The Steroids

With a larger group we could review more articles and have results while people are still reading it. There are not enough (climate) scientists to do this. 

For Climate Feedback I only review articles on topics where I have expertise. But I think I would still do a decent job outside of my expertise. It is hard to determine how good a good article is, but the ones that are clearly bad are easy to identify and this does not require much expertise. At least in the climate branch of the US culture war the same tropes are used over and over again, the same "thinking" errors are made over and over again. 

Many who are interested in climate change are interested in scientific detail, but are not scientists, would probably do a good job identifying these bad articles. Maybe even better. They say that magicians were better at debunking paranormal claims than scientists. We live in a bubble where most argue in good faith and science-interested normal citizens may well have a better BS detector.

However, how do we know who is good at this? Clearly not everyone, otherwise such a service would not be needed. We would have the data from Climate Feedback and Health Feedback to determine which citizen scientist's assessments predict the assessments of the scientists well. We could also ask people to classify the topic of the article. I would be best at observational climatology, decent in physical climatology and likely only average when it comes to many climate change impacts and economic questions. We could also ask people how confident they are in their assessments.

In the end it would be great to ingest ratings in a user friendly way with 1) a browser add-on on the article homepage itself, 2) replying to posts mentioning the article on social media, like replying to a tweet adding the handle of the PubPeerBot automatically submits the tweet to PubPeer.

A server would compute the ratings and as soon as there is enough data create a review homepage with the ratings as metadata to be used by search engines and social media sites. We will have to see if they are willing to use such a statistical product. Also an application programming interface (API) and ActivityPub can be used to spread the information to interested parties.

I would be happy to use this information on the micro-blogging system for scientists Frank Sonntag and I have set up. I presume more Open Social Media communities would be grateful for the information to make their place more reality-friendly. A browser add-on could also display the feedback on the article's homepage itself and on posts linking to it.

How to start?

Before creating such a huge system I would propose a much smaller feasibility study. Here people would be informed about articles Climate or Health Feedback are working on and they can return their assessments until the one of Climate Feedback is published. This could be a simple email distribution list to distribute the articles and a cloud-based spread sheet or web form to return the results. 

This system should be enough to study whether citizens can distinguish fact from fiction well enough (I expect so, but knowing for sure is valuable) and develop statistical methods to estimate how well people are doing, how to compute an all over score and how many reviews are needed to do so.

This set-up points to two complications the full system would have. Firstly, only citizen's assessments that are made before the official feedback can be used. this should not be too much of a problem as most readers will read the article before the official feedback is published.

Secondly, as the number of official feedbacks will be small many volunteers will likely not review any of these articles themselves or just a few. Thus the assessment of how accurate the predictions of person A of articles X, Y and Z are may have to be assessed comparing their assessments with those of B, C and D who review X, Y or Z as well as one of the articles Climate Feedback reviewed. This makes the computation more complicated and uncertain, but if B, C and D are good enough, this should be doable. Alternatively, we would have to keep on informing our volunteers of the articles being reviewed by the scientists themselves.

This new system could be part of Science Feedback or an independent initiative. I feel, it would at least be good to have a separate homepage as the two systems are quite different and the public should not mix them up. A reason to keep it separate is that this system could also be used in combination with other fact checkers, but we could also make that organizational change when it comes to that.

Another organization question is whether we would like Google and Facebook to have access to this information or prefer a license that excludes them. Short term it is naturally best when they also use it to inform as many people as possible. Long-term it would also be valuable to break the monopolies of Google and Facebook. Having alternative services that can deliver better quality due to our assessments could contribute to that. They have money, we have people.

I asked on Twitter and Mastodon whether people would be interested in contributing to such a system. Fitting to my prejudice people on Twitter were more willing to review (I do more science on Twitter) and people on Mastodon were more willing to build software (Mastodon started with many coders).

What do you think? Could such a system work? Would enough people be willing to contribute? Is it technologically and statistically feasible? Any ideas to make the system or the feasibility study better?

Related reading

Climate Feedback explainer from 2016: Climate scientists are now grading climate journalism
Discussion of a controversial Climate Feedback and the grading system used: Is nitpicking a climate doomsday warning allowed?

Monday, 12 October 2020

The deleted chapter of the WMO Guidance on the homogenisation of climate station data

The Task Team on Homogenization (TT-HOM) of the Open Panel of CCl Experts on Climate Monitoring and Assessment (OPACE-2) of the Commission on Climatology (CCl) of the World Meteorological Organization (WMO) has published their Guidance on the homogenisation of climate station data.

The guidance report was a bit longish, so at the end we decided that the last chapter on "Future research & collaboration needs" was best deleted. As chair of the task team and as someone who likes tp dream about what others could do in a comfy chair, I wrote most of this chapter and thus we decided to simply make it a blog post for this blog. Enjoy.


This guidance is based on our current best understanding of inhomogeneities and homogenisation. However, writing it also makes clear there is a need for a better understanding of the problems.

A better mathematical understanding of statistical homogenisation is important because that is what most of our work is based on. A stronger mathematical basis is a prerequisite for future methodological improvements.

A stronger focus on a (physical) understanding of inhomogeneities would complement and strengthen the statistical work. This kind of work is often performed at the station or network level, but also needed at larger spatial scales. Much of this work is performed using parallel measurements, but they are typically not internationally shared.

In an observational science the strength of the outcomes depends on a consilience of evidence. Thus having evidence on inhomogeneities from both statistical homogenisation and physical studies strengthens the science.

This chapter will discuss the needs for future research on homogenisation grouped in five kinds of problems. In the first section we will discuss research on improving our physical understanding and physics-based corrections. The next section is about break detection, especially about two fundamental problems in statistical homogenisation: the inhomogeneous-reference problem and the multiple-breakpoint problem.

Next write about computing uncertainties in trends and long-term variability estimates from homogenised data due to remaining inhomogeneities. It may be possible to improve correction methods by treating it as a statistical model selection problem. The last section discusses whether inhomogeneities are stochastic or deterministic and how that may affect homogenisation and especially correction methods for the variability around the long-term mean.

For all the research ideas mentioned below, it is understood that in future we should study more meteorological variables than temperature. In addition, more studies on inhomogeneities across variables could be helpful to understand the causes of inhomogeneities and increase the signal to noise ratio. Homogenisation by national offices has advantages because here all climate elements from one station are stored together. This helps in understanding and identifying breaks. It would help homogenisation science and climate analysis to have a global database for all climate elements, like iCOADS for marine data. A Copernicus project has started working on this for land station data, which is an encouraging development.

Physical understanding

It is a good scientific practice to perform parallel measurements in order to manage unavoidable changes and to compare the results of statistical homogenisation to the expectations given the cause of the inhomogeneity according to the metadata. This information should also be analysed on continental and global scales to get a better understanding of when historical transitions took place and to guide homogenisation of large-scale (global) datasets. This requires more international sharing of parallel data and standards on the reporting of the size of breaks confirmed by metadata.

The Dutch weather service KNMI published a protocol how to manage possible future changes of the network, who decides what needs to be done in which situation, what kind of studies should be made, where the studies should be published and that the parallel data should be stored in their central database as experimental data. A translation of this report will soon be published by the WMO (Brandsma et al., 2019) and will hopefully inspire other weather services to formalise their network change management.

Next to statistical homogenisation, making and studying parallel measurements, and other physical estimates, can provide a second line of evidence on the magnitude of inhomogeneities. Having multiple lines of evidence provides robustness to observational sciences. Parallel data is especially important for the large historical transitions that are most likely to produce biases in network-wide to global climate datasets. It can validate the results of statistical homogenisation and be used to estimate possibly needed additional adjustments. The Parallel Observations Science Team of the International Surface Temperature Initiative (ISTI-POST) is working on building such a global dataset with parallel measurements.

Parallel data is especially suited to improve our physical understand of the causes of inhomogeneities by studying how the magnitude of the inhomogeneity depends on the weather and on instrumental design characteristics. This understanding is important for more accurate corrections of the distribution, for realistic benchmarking datasets to test our homogenisation methods and to determine which additional parallel experiments are especially useful.

Detailed physical models of the measurement, for example, the flow through the screens, radiative transfer and heat flows, can also help gain a better understanding of the measurement and its error sources. This aids in understanding historical instruments and in designing better future instruments. Physical models will also be paramount for understanding the impact of the surrounding on the measurement — nearby obstacles and surfaces influencing error sources and air flow — to changes in the measurand, such as urbanisation/deforestation or the introduction of irrigation. Land-use changes, especially urbanisation, should be studied together with relocations they may provoke.

Break detection

Longer climate series typically contain more than one break. This so-called multiple-breakpoint problem is currently an important research topic. A complication of relative homogenisation is that also the reference stations can have inhomogeneities. This so-called inhomogeneous-reference problem is not optimally solved yet. It is also not clear what temporal resolution is best for detection and what the optimal way is to handle the seasonal cycle in the statistical properties of climate data and of many inhomogeneities.

For temperature time series about one break per 15 to 20 years is typical and multiple breaks are thus common. Unfortunately, most statistical detection methods have been developed for one break and for the null hypothesis of white (sometimes red) noise. In case of multiple breaks the statistical test should not only take the noise variance into account, but also the break variance from breaks at other positions. For low signal to noise ratios, the additional break variance can lead to spurious detections and inaccuracies in the break position (Lindau and Venema, 2018a).

To apply single-breakpoint tests on series with multiple breaks, one ad-hoc solution is to first split the series at the most significant break (for example, the standard normalised homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution. Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably. This method is not used much nowadays.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. This can be done in a hypothesis testing (MASH) or in a statistical model selection framework. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011b). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.

This penalty function of PRODIGE was found to be suboptimal (Lindau and Venema, 2013). It was found that the penalty should be a function of the number of breaks, not fixed per break and that the relation with the length of the series should be reversed. It is not clear yet how sensitive homogenisation methods respond to this, but increasing the penalty per break in case of low SNR to reduce the number of breaks does not make the estimated break signal more accurate (Lindau and Venema, 2018a).

Not only the candidate station, also the reference stations will have inhomogeneities, which complicates homogenisation. Such inhomogeneities can be climatologically especially important when they are due to network-wide technological transitions. An example of such a transition is the current replacement of temperature observations using Stevenson screens by automatic weather stations. Such transitions are important periods as they may cause biases in the network and global average trends and they produce many breaks over a short period.

A related problem is that sometimes all stations in a network have a break at the same date, for example, when a weather service changes the time of observation. Nationally such breaks are corrected using metadata. If this change is unknown in global datasets one can still detect and correct such inhomogeneities statistically by comparison with other nearby networks. That would require an algorithm that additionally knows which stations belong to which network and prioritizes correcting breaks found between stations in different networks. Such algorithms do not exist yet and information on which station belongs to which network for which period is typically not internationally shared.

The influence of inhomogeneities in the reference can be reduced by computing composite references over many stations, removing reference stations with breaks and by performing homogenisation iteratively.

A direct approach to solving this problem would be to simultaneously homogenise multiple stations, also called joint detection. A step in this direction are pairwise homogenisation methods where breaks are detected in the pairs. This requires an additional attribution step, which attributes the breaks to a specific station. Currently this is done by hand (for PRODIGE; Caussinus and Mestre, 2004; Rustemeier et al., 2017) or with ad-hoc rules (by the Pairwise homogenisation algorithm of NOAA; Menne and Williams, 2009).

In the homogenisation method HOMER (Mestre et al., 2013) a first attempt is made to homogenise all pairs simultaneously using a joint detection method from bio-statistics. Feedback from first users suggests that this method should not be used automatically. It should be studied how good this methods works and where the problems come from.

Multiple breakpoint methods are more accurate as single breakpoint methods. This expected higher accuracy is founded on theory (Hawkins, 1972). In addition, in the HOME benchmarking study it was numerically found that modern homogenisation methods, which take the multiple breakpoint and the inhomogeneous reference problems into account, are about a factor two more accurate as traditional methods (Venema et al., 2012).

However, the current version of CLIMATOL applies single-breakpoint detection tests, first SNHT detection on a window then splitting, to achieve results comparable to modern multiple-breakpoint methods with respect to break detection and homogeneity of the data (Killick, 2016). This suggests that the multiple-breakpoint detection principle may not be as important as previously thought and warrants deeper study or the accuracy of CLIMATOL is partly due to an unknown unknown.

The signal to noise ratio is paramount for the reliable detection of breaks. It would thus be valuable to develop statistical methods that explain part of the variance of a difference time series and remove this to see breaks more clearly. Data from (regional) reanalysis could be useful predictors for this.

First methods have been published to detect breaks for daily data (Toreti et al., 2012; Rienzner and Gandolfi, 2013). It has not been studied yet what the optimal resolution for breaks detection is (daily, monthly, annual), nor what the optimal way is to handle the seasonal cycle in the climate data and exploit the seasonal cycle of inhomogeneities. In the daily temperature benchmarking study of Killick (2016) most non-specialised detection methods performed better than the daily detection method MAC-D (Rienzner and Gandolfi, 2013).

The selection of appropriate reference stations is a necessary step for accurate detection and correction. Many different methods and metrics are used for the station selection, but studies on the optimal method are missing. The knowledge of local climatologists which stations have a similar regional climate needs to be made objective so that it can be applied automatically (at larger scales).

For detection a high signal to noise ratio is most important, while for correction it is paramount that all stations are in the same climatic region. Typically the same networks are used for both detection and correction, but it should be investigated whether a smaller network for correction would be beneficial. Also in general, we need more research on understanding the performance of (monthly and daily) correction methods.

Computing uncertainties

  • Also after homogenisation uncertainties remain in the data due to various problems: Not all breaks in the candidate station have been and can be detected.

  • False alarms are an unavoidable trade-off for detecting many real breaks.

  • Uncertainty in the estimation of correction parameters due to limited data.

  • Uncertainties in the corrections due to limited information on the break positions.

From validation and benchmarking studies we have a reasonable idea about the remaining uncertainties that one can expect in the homogenised data, at least with respect to changes in the long-term mean temperature. For many other variables and changes in the distribution of (sub-)daily temperature data individual developers have validated their methods, but systematic validation and comparison studies are still missing.

Furthermore, such studies only provide a general uncertainty level, whereas more detailed information for every single station/region and period would be valuable. The uncertainties will strongly depend on the signal to noise ratios, on the statistical properties of the inhomogeneities of the raw data and on the quality and cross-correlations of the reference stations. All of which vary strongly per station, region and period.

Communicating such a complicated errors structure, which is mainly temporal, but also partially spatial, is a problem in itself. Furthermore, not only the uncertainty in the means should be considered, but, especially for daily data, uncertainties in the complete probability density function need to be estimated and communicated. This could be communicated with an ensemble of possible realisations, similar to Brohan et al. (2006).

An analytic understanding of the uncertainties is important, but is often limited to idealised cases. Thus also numerical validation studies, such as the past HOME and upcoming ISTI studies are important for an assessment of homogenisation algorithms under realistic conditions.

Creating validation datasets also help to see the limits of our understanding of the statistical properties of the break signal. This is especially the case for variables other than temperature and for daily and (sub-)daily data. Information is needed on the real break frequencies and size distributions, but also their auto-correlations and cross-correlations, as well as explained in the next section the stochastic nature of breaks in the variability around the mean.

Validation studies focussed on difficult cases would be valuable for a better understanding. For example, sparse networks, isolated island networks, large spatial trend gradients and strong decadal variability in the difference series of nearby stations (for example, due to El Nino in complex mountainous regions).

The advantage of simulated data is that it can create a large number of quite realistic complete networks. For daily data it will remain hard for the years to come to determine how to generate a realistic validation dataset. Thus even if using parallel measurements is mostly limited to one break per test, it does provide the highest degree of realism for this one break.

Deterministic or stochastic corrections?

Annual and monthly data is normally used to study trends and variability in the mean state of the atmosphere. Consequently, typically only the mean is adjusted by homogenisation. Daily data, on the other hand is used to study climatic changes in weather variability, severe weather and extremes. Consequently, not only the mean should be corrected, but the full probability distribution describing the variability of the weather.

The physics of the problem suggests that many inhomogeneities are caused by stochastic processes. An example affecting many instruments are differences in the response time of instruments, which can lead to differences determined by turbulence. A fast thermometer will on average read higher maximum temperatures than a slow one, but this difference will be variable and sometimes be much higher than the average. In case of errors due to insolation the radiation error will be modulated by clouds. An insufficiently shielded thermometer will need larger corrections on warm days, which will typically be more sunny, but some warm days will be cloudy and not need much correction, while other warm days are sunny and calm and have a dry hot surface. The adjustment of daily data for studies on changes in the variability is thus a distribution problem and not only a regression bias-correction problem. For data assimilation (numerical weather prediction) accurate bias correction (with regression methods) is probably the main concern.

Seen as a variability problem, the correction of daily data is similar to statistical downscaling in many ways. Both methodologies aim to produce bias-corrected data with the right variability, taking into account the local climate and large-scale circulation. One lesson from statistical downscaling is that increasing the variance of a time series deterministically by multiplication with a fraction, called inflation, is the wrong approach and that the variance that could not be explained by regression using predictors should be added stochastically as noise instead (Von Storch, 1999). Maraun (2013) demonstrated that the inflation problem also exists for the deterministic Quantile Matching method, which is also used in daily homogenisation. Current statistical correction methods deterministically change the daily temperature distribution and do not stochastically add noise.

Transferring ideas from downscaling to daily homogenisation is likely fruitful to develop such stochastic variability correction methods. For example, predictor selection methods from downscaling could be useful. Both fields require powerful and robust (time invariant) predictors. Multi-site statistical downscaling techniques aim at reproducing the auto- and cross-correlations between stations (Maraun et al., 2010), which may be interesting for homogenisation as well.

The daily temperature benchmarking study of Rachel Killick (2016) suggests that current daily correction methods are not able to improve the distribution much. There is a pressing need for more research on this topic. However, these methods likely also performed less well because they were used together with detection methods with a much lower hit rate than the comparison methods.

The deterministic correction methods may not lead to severe errors in homogenisation, that should still be studied, but stochastic methods that implement the corrections by adding noise would at least theoretically fit better to the problem. Such stochastic corrections are not trivial and should have the right variability on all temporal and spatial scales.

It should be studied whether it may be better to only detect the dates of break inhomogeneities and perform the analysis on the homogeneous subperiods (removing the need for corrections). The disadvantage of this approach is that most of the trend variance is in the difference in the mean of the HSPs and only a small part is in the trend within the HPSs. In case of trend analysis, this would be similar to the work of the Berkeley Earth Surface Temperature group on the mean temperature signal. Periods with gradual inhomogeneities, e.g., due to urbanisation, would have to be detected and excluded from such an analysis.

An outstanding problem is that current variability correction methods have only been developed for break inhomogeneities, methods for gradual ones are still missing. In homogenisation of the mean of annual and monthly data, gradual inhomogeneities are successfully removed by implementing multiple small breaks in the same direction. However, as daily data is used to study changes in the distribution, this may not be appropriate for daily data as it could produce larger deviations near the breaks. Furthermore, changing the variance in data with a trend can be problematic (Von Storch, 1999).

At the moment most daily correction methods correct the breaks one after another. In monthly homogenisation it is found that correcting all breaks simultaneously (Caussinus and Mestre, 2004) is more accurate (Domonkos et al., 2013). It is thus likely worthwhile to develop multiple breakpoint correction methods for daily data as well.

Finally, current daily correction methods rely on previously detected breaks and assume that the homogeneous subperiods (HSP) are homogeneous (i.e., each segment between breakpoints assume to be homogeneous) . However, these HSP are currently based on detection of breaks in the mean only. Breaks in higher moments may thus still be present in the "homogeneous" sub periods and affect the corrections. If only for this reason, we should also work on detection of breaks in the distribution.

Correction as model selection problem

The number of degrees of freedom (DOF) of the various correction methods varies widely. From just one degree of freedom for annual corrections of the means, to 12 degrees of freedom for monthly correction of the means, to 40 for decile corrections applied to every season, to a large number of DOF for quantile or percentile matching.

A study using PRODIGE on the HOME benchmark suggested that for typical European networks monthly adjustment are best for temperature; annual corrections are probably less accurate because they fail to account for changes in seasonal cycle due to inhomogeneities. For precipitation annual corrections were most accurate; monthly corrections were likely less accurate because the data was too noisy to estimate the 12 correction constants/degrees of freedom.

What is the best correction method depends on the characteristics of the inhomogeneity. For a calibration problem just the annual mean could be sufficient, for a serious exposure problem (e.g., insolation of the instrument) a seasonal cycle in the monthly corrections may be expected and the full distribution of the daily temperatures may need to be adjusted. The best correction method also depends on the reference. Whether the variables of a certain correction model can be reliably estimated depends on how well-correlated the neighbouring reference stations are.

An entire regional network is typically homogenised with the same correction method, while the optimal correction method will depend on the characteristics of each individual break and on the quality of the reference. These will vary from station to station, from break to break and from period to period. Work on correction methods that objectively select the optimal correction method, e.g., using an information criterion, would be valuable.

In case of (sub-)daily data, the options to select from become even larger. Daily data can be corrected just for inhomogeneities in the mean (e.g., Vincent et al., 2002, where daily temperatures are corrected by incorporating a linear interpolation scheme that preserves the previously defined monthly corrections) or also for the variability around the mean. In between are methods that adjust for the distribution including the seasonal cycle, which dominates the variability and is thus effectively similar to mean adjustments with a seasonal cycle. Correction methods of intermediate complexity with more than one, but less than 10 degrees of freedom would fill a gap and allow for more flexibility in selecting the optimal correction model.

When applying these methods (Della-Marta and Wanner, 2006; Wang et al., 2010; Mestre et al., 2011; Trewin, 2013) the number of quantile bins (categories) needs to be selected as well as whether to use physical weather-dependent predictors and the functional form they are used (Auchmann and Brönnimann, 2012). Objective optimal methods for these selections would be valuable.

Related information

WMO Guidelines on Homogenization (English, French, Spanish) 

WMO guidance report: Challenges in the Transition from Conventional to Automatic Meteorological Observing Networks for Long-term Climate Records