Wednesday 15 October 2014

Scientific meetings. The freedom to tweet and the freedom not to be tweeted



Some tweets from a meeting on Arctic sea ice reduction organised by the Royal Society recently caused a stir, when the speaker cried "defamation" and wrote letters to the employers of the tweeters. Stoat and Paul Matthews have the story.

The speaker's reaction was much too strongly, in my opinion, most tweets were professional and respectful critique should be allowed. I have only seen one tweet, that should not have been written ("now back to science").

I do understand that the speaker feels like people are talking behind his back. He is not on twitter and even if he were: you cannot speak and tweet simultaneously. Yes, people do the same on the conference floors and in bars, but then you at least do not notice it. For balance it should be noted that there was also plenty of critique given after the talk; that people were not convinced was thus not behind his back.

Related to this, a blog post is just a long tweet, Paige Brown Jarreau asks:

Almost all scientists use both papers and meetings for communication. Tweets and blogs do not have that status; they could complement the informal discussions at meetings, but do differ in that everyone can read them, for all time. Social media will never be and should never be a substitute for the scientific literature.

Imagine that I had some preliminary evidence that the temperature increase since 1900 is nearly zero or that we may already have passed the two degree limit. I would love to discuss such evidence with my colleagues, to see if they notice any problems with the argumentation, to see if I had overlooked something, to see if there are better methods or data that would make the evidence stronger. I certainly would not like to see such preliminary ideas as a headline in the New York Times until I had gathered and evaluated all the evidence.

The problem with social media is that the boundaries between public and private are blurring. After talking about such a work at a conference, someone may tweet about it and before you know it the New York Times is on the telephone.

Furthermore, you always communicate with a certain person or audience and tailor your message to the receiver. When I write on my blog, I explain much more than when I talk to a colleague. Reversely, if someone hears or reads my conversation with a colleague this may be confusing because of the lack of explanation and give the wrong impression. In person at a conference a sarcastic remark is easily detected, on the written internet sarcasm does not work, especially when it comes to climate "debate" where there is no opinion too exotic.

This is not an imaginary concern. The OPERA team at CERN that found that neutrinos could travel faster than light got into trouble this way. The team was forced to inform the press prematurely because blogs started writing about their finding. The team made it very clear that this was still very likely a measurement error: “If this measurement is confirmed, it might change our view of physics, but we need to be sure that there are no other, more mundane, explanations. That will require independent measurements.” But a few months after the error was found, a stupid loose cable, the spokesperson and physics coordinator of OPERA had to resign. I would think that that would not have happened without all the premature publicity.

If I were to report that the two degree limit has already been reached, that the raw temperature data had a severe cooling bias, a multimedia smear campaign without comparison would start. Then I'd better have the evidence in my pocket. The OPERA example shows that even if you do not overstate your case, your job is in jeopardy. Furthermore, such a campaign would make further work extremely difficult, even in a country like Germany that has Freedom of Research in its constitution to prevent political interference with science:
Arts and sciences, research and teaching shall be free.
(Kunst und Wissenschaft, Forschung und Lehre sind frei)
This fortunate fact, for example, disallows FOIA harassment of scientists.

That openness is not necessary in the preliminary stages fits to the pivotal role of the scientific literature in science. In an article a scientist describes his findings in all the detail necessary for others to replicate it and build on it. That is the moment everything comes in the open. If the article is written well that is all one should need.

I hope that one day all scientific articles will be open access so that everyone can read them. I personally prefer to publish my data and code, if relevant, and would encourage all scientists to do so. However, how such a scientific article came into existence is not of anyone's business.

All the trivial and insightful mistakes that were made are not of anyone's business. And we need a culture in which people are allowed to make mistakes to get ahead in science. As a saying goes: if you are not wrong half of the time you are not pushing yourself enough to the edge of our understanding. By putting preliminary ideas in the limelight too soon you stifle experimentation and exploration.

In the beginning of a project I often request a poster to be able to talk about it with my most direct colleagues, rather than requesting a talk, which would broadcast the ideas to a much broader audience. (A secondary reason is that a well-organised poster session also provides much more feedback.) Once the ideas have matured a talk is great to tell everyone about it.

If a scientists chooses to show preliminary work before publication that is naturally fine. For certain projects the additional feedback my be valuable or even necessary as in case of collaboration with citizen scientists. And normally the New York Times will not be interested. However, we should not force people to work that way. It may not be ideal for every scientific question or person.

Opening up scientific meetings with social media and webcasts may intimidate (young) researchers and in this way limit discussion. Even at an internal seminar, students are often too shy to ask questions. On the days the professor is not able to attend, there are often much more questions. External workshops are even more intimidating, large conferences are even worse, and having to talk to a global audience because of social media is the worst of all.

More openness is not automatically more or better debate. It can stifle debate and also move it to smaller closed circles, which would be counter productive.

Personally I do not care much who is listening, as long as the topic is science I feel perfectly comfortable. The self-selected group of scientists that blogs and tweets probably feels the same. However, not everyone is that way. Some people who are much smarter than I am would like to first sharpen their pencils and think a while before they comment. I know from feedback by mail and at conferences that much more of my colleagues read this blog than I had expected because they hardly write comments. Writing something for eternity without first thinking about it for a few days, weeks or months is not everyone's thing. This is something we should take into account before we open informal communication up too much.

In spring I asked the organisers of a meeting how we should handle social media:
A question we may want to discuss during the introduction on Monday morning: Do people mind about the use of social media during the meeting? Twitter and blogs, for example. What we discuss is also interesting for people unable to attend the meeting, but we should also not make informal discussions harder by opening up to the public too much.
I was thinking about people saying in advance if they do not want their talk to be public and maybe we should also keep the discussions after the talks private, so that people do not have the think twice about the correctness of every single sentence.
The organisation kindly asked me to refrain from tweeting. Maybe that was the reply because they were busy and had never considered the topic. But that reply was fine by me. How appropriate social media are depends on the context and this was a small meeting, where opening it up to the world would be a large change in atmosphere.

I guess social media is less of a problem the general assembly of the European Geophysical Union (EGU), where you know that there is much press around. Especially for some of the larger sessions where there can be hundreds of scientists and some journalists in the audience. You would not use such large audiences to bounce some new ideas, but to explain the current state of the art.

Even EMS and EGU the organisation provides some privacy: it is officially not allowed to make photos of the posters. I would personally prefer that every scientist can indicate him or herself whether this is okay for his poster (and if you make rules, you should also enforce them).

Another argument against tweeting is that it distracts the tweeter. At last weeks EMS2014 there was no free Wi-Fi in the conference rooms (just in a separate working room). I thought that was a good thing. People were again listening to the talks, like in the past, and not tweeting, surfing or doing their email.

[UPDATE. Doug McNeall, the MetOffice guy that convinced me to start tweeting, has written a response on his blog.]

Related Reading

Letter to Science by Germán Orizaola and Ana Elisa Valdés: Free the tweet at scientific conferences

Kathleen Fitzpatrick (Director of Scholarly Communication) gives some sensible Advice on Academic Blogging, Tweeting, Whatever. For example: “If somebody says they’d prefer not to be tweeted or blogged, respect that” and “Do not let dust-ups such as these stop you from blogging/tweeting/whatever”.

I previously wrote about: The value of peer review for science and the press. It would be nice if the press would at least wait until a study is published. Even better would be to wait until several study have been made. But that is something we, as scientists, cannot control.

* Photo by Juan Emilio used with a Creative Commons CC BY-SA 2.0 licence.

Wednesday 8 October 2014

A framework for benchmarking of homogenisation algorithm performance on the global scale - Paper now published

By Kate Willett reposted from the Surface Temperatures blog of the International Surface Temperature Initiative (ISTI).

The ISTI benchmarking working group have just had their first benchmarking paper accepted at Geoscientific Instrumentation, Methods and Data Systems:

Willett, K., Williams, C., Jolliffe, I. T., Lund, R., Alexander, L. V., Brönnimann, S., Vincent, L. A., Easterbrook, S., Venema, V. K. C., Berry, D., Warren, R. E., Lopardo, G., Auchmann, R., Aguilar, E., Menne, M. J., Gallagher, C., Hausfather, Z., Thorarinsdottir, T., and Thorne, P. W.: A framework for benchmarking of homogenisation algorithm performance on the global scale, Geosci. Instrum. Method. Data Syst., 3, 187-200, doi:10.5194/gi-3-187-2014, 2014.

Benchmarking, in this context, is the assessment of homogenisation algorithm performance against a set of realistic synthetic worlds of station data where the locations and size/shape of inhomogeneities are known a priori. Crucially, these inhomogeneities are not known to those performing the homogenisation, only those performing the assessment. Assessment of both the ability of algorithms to find changepoints and accurately return the synthetic data to its clean form (prior to addition of inhomogeneity) has three main purposes:

1) quantification of uncertainty remaining in the data due to inhomogeneity
2) inter-comparison of climate data products in terms of fitness for a specified purpose
3) providing a tool for further improvement in homogenisation algorithms

Here we describe what we believe would be a good approach to a comprehensive homogenisation algorithm benchmarking system. Thfis includes an overarching cycle of: benchmark development; release of formal benchmarks; assessment of homogenised benchmarks and an overview of where we can improve for next time around (Figure 1).

Figure 1 Overview the ISTI comprehensive benchmarking system for assessing performance of homogenisation algorithms. (Fig. 3 of Willett et al., 2014)

There are four components to creating this benchmarking system.

Creation of realistic clean synthetic station data
Firstly, we must be able to synthetically recreate the 30000+ ISTI stations such that they have the correct variability, auto-correlation and interstation cross-correlations as the real data but are free from systematic error. In other words, they must contain a realistic seasonal cycle and features of natural variability (e.g., ENSO, volcanic eruptions etc.). There must be a realistic persistence month-to-month in each station and geographically across nearby stations.

Creation of realistic error models to add to the clean station data
The added inhomogeneities should cover all known types of inhomogeneity in terms of their frequency, magnitude and seasonal behaviour. For example, inhomogeneities could be any or a combination of the following:

- geographically or temporally clustered due to events which affect entire networks or regions (e.g. change in observation time);
- close to end points of time series;
- gradual or sudden;
- variance-altering;
- combined with the presence of a long-term background trend;
- small or large;
- frequent;
- seasonally or diurnally varying.

Design of an assessment system
Assessment of the homogenised benchmarks should be designed with the three purposes of benchmarking in mind. Both the ability to correctly locate changepoints and to adjust the data back to its homogeneous state are important. It can be split into four different levels:

- Level 1: The ability of the algorithm to restore an inhomogeneous world to its clean world state in terms of climatology, variance and trends.

- Level 2: The ability of the algorithm to accurately locate changepoints and detect their size/shape.

- Level 3: The strengths and weaknesses of an algorithm against specific types of inhomogeneity and observing system issues.

- Level 4: A comparison of the benchmarks with the real world in terms of detected inhomogeneity both to measure algorithm performance in the real world and to enable future improvement to the benchmarks.

The benchmark cycle
This should all take place within a well laid out framework to encourage people to take part and make the results as useful as possible. Timing is important. Too long a cycle will mean that the benchmarks become outdated. Too short a cycle will reduce the number of groups able to participate.

Producing the clean synthetic station data on the global scale is a complicated task that has now taken several years but we are close to completion of a version 1. We have collected together a list of known regionwide inhomogeneities and a comprehensive understanding of the many many different types of inhomogeneities that can affect station data. We have also considered a number of assessment options and decided to focus on levels 1 and 2 for assessment within the benchmark cycle. Our benchmarking working group is aiming for release of the first benchmarks by January 2015.