Sunday 8 May 2016

Grassroots scientific publishing

These were the weeks of peer review. Sophie Lewis wrote her farewell to peer reviewing. Climate Feedback is making it easy for scientists to review journalistic articles with nifty new annotation technology. And Carbon Brief showed that while there is a grey area, it is pretty easy to distinguish between science and nonsense in the climate "debate", which is one of the functions of peer review. And John Christy and Richard McNider managed to get an article published, which I would have advised to reject as reviewer. A little longer ago we had the open review of the Hansen sea level rise paper, where the publicity circus resulted in a-scientific elements spraying their graffiti on the journal wall.

Sophie Lewis writes about two recent reviews she was asked to make. One where the reviewers were negative, but the article was published anyway by the volunteer editor and one case where the reviewers were quite positive, but the manuscript was rejected by a salaried editor.

I have had similar experiences. As reviewer you invest your time and heart in a manuscript and root for the ones you like to make it in print. Making the final decision naturally is the task of the editor, but it is very annoying as a reviewer to have the feeling your review is ignored. There are many interesting things you could have done in that time. At least nowadays you get to see the other reviews and hear the final decision more often, which is motivating.

The European Geophysical Union has a range of journals with open review, where you can see the first round of reviews and anyone can contribute reviews. This kind of open review could benefit from the annotation system used by Climate Feedback to review journalistic articles; it makes reviewing easier and the reader can immediately see the text the review refers to. The open annotation system allows you to add comments to any webpage or PDF article or manuscript. You can see it as an extra layer on top of the web.

The reviewer can select a part of the text and add comments, including figures and links to references. Here is an annotated article in the New York Times that Climate Feedback found to be scientifically very credible, where you can see the annotate system in action. You can click on the text with a yellow background to see the corresponding comment or click on the small symbol at the top right to see all comments. (Examples of articles with low scientific credibility are somehow mostly pay-walled; one would think that the dark money behind these articles would want them to be read widely.)

I got to know annotation via Climate Feedback. We use the annotation system of Hypothes.is and this system was actually not developed to annotate journalistic articles, but for reviewing scientific articles.

The annotation system makes writing a review easier for the reviewer and makes it easier to read reviews. The difference between writing some notes on an article for yourself and a peer review becomes gradual this way. It cannot take away having to read the manuscript and trying to understand it. That takes most time, but this is the fun part, reducing time time for the tedious part makes it more attractive to review.

Publishing and peer review

Is there a better way to review and publish? The difficult part is no longer the publishing. The central part that remains is the trust of a reader in a source.

It starts to become ironic that the owners of the scientific journals are called "scientific publishers" because the main task of a publisher is nowadays no longer the publishing. Everyone can do that nowadays with a (free) word processor and a (free) web page. The publishers and their journals are mostly brands nowadays. The scientific publisher, the journal is a trusted name. Trust is slow to build up (and easy to lose), producing huge barriers to entry and leading to near monopoly profits of scientific publishing houses of 30 to 40%. That is tax-payer money that is not spend on science and promotes organization that prefer to keep science unused behind pay-walls.

Peer review performs various functions. It helps to give a manuscript the initial credibility that makes people trust it, that makes people willing to invest time in it to study its ideas. If the scientific literature would be as abominable as the mitigation skeptical blog Watts Up With That (WUWT) scientific progress would slow down enormously. At WUWT the unqualified readers are supposed to find out themselves whether they are being conned or not. Even if they would do so: having every reader do a thorough review is wasteful; it is much more efficient to ask a few experts to first vet manuscripts.

Without peer review it would be harder for new people to get others to read their work, especially if they would make a spectacular claim and use unfamiliar methods. My colleagues will likely be happy to read my homogenization papers without peer review. Gavin Schmidt's colleagues will be happy to read his climate modelling papers and Michel Mann's colleagues his papers on climate reconstructions. But for new people it would be harder to be heard, for me it would be harder to be heard if I would publish something about another topic and for outsiders it would be harder to judge who is credible. The latter is increasingly important the more interdisciplinary sciences becomes.

Improving peer review

When I was dreaming of a future review system where scientific articles were all in one global database, I used to think of a system without journals or editors. The readers would simply judge the articles and comments, like on Ars Technica or Slashdot. The very active open science movement in Spain has implemented such a peer review system for institutional repositories, where the manuscripts and reviews are judged and reputation metrics are estimated. Let me try to explain why I changed my mind and how important editors and journals are for science.

One of my main worries for a flat database would be that there would be many manuscripts that never got any review. In the current system the editor makes sure that every reasonable manuscript gets a review. Without an editor explicitly asking a scientist to write a review, I would expect that many articles would never get a review. Personal relations are important.

Science is not a democracy, but a meritocracy. Just voting an article up or down does not do the job. It is important that this decision is made carefully. You could try to statistically determine which readers are good at predicting the quality of an article, where quality could be determined by later votes or citations. This would be difficult, however, because it is important that the assessment is made by people with the right expertise, often by people from multiple backgrounds; we have seen how much even something as basic as the scientific consensus on climate change depends on expertise. Try determining expertise algorithmically. The editor knows the reviewers.

While it is not a democracy, the scientific enterprise should naturally be open. Everyone is welcome to submit manuscripts. But editors and reviewers need to be trusted and level headed individuals.

More openness in publishing could in future come from everyone being able to start a "journal" by becoming editor (or better by organization a group of editors) and try to convince their colleagues that they do a good job. The fun thing about the annotation system is that you can demonstrate that you do a good job using existing articles and manuscripts.

This could provide real value for the reader. Not only would the reviews be visible, but it would also be possible to explain why an article was accepted, was it speculative, but really interesting if true (something for experts) or was it simply solid (something for outsiders). Which parts do the experts debate about. The debate would also continue after acceptance.

The code and the data of every "journal" should be open so that everyone can start a new "journal" with reviewed articles. So that when Heartland offers me a nice amount of dark money to start accepting WUWT-quality articles, a group of colleagues can start a new journal and fix my dark-money "mistakes", but otherwise have a complete portfolio from the beginning. If they would have to start from scratch that would be a large barrier to entry, which like the traditional system encourages sloppy work, corruption and power abuse.

Peer review is also not just for selecting articles, but also to help making them better. Theoretically the author can also ask colleagues to do so, but in practice reviewers are better in finding errors. Maybe because the colleagues who will put in most effort are your friends who have to same blind spots? These improvements of the manuscript would also be missing in a pure voting system of "finished" articles. Having a manuscript phase is helpful.

Finally, an editor makes anonymous reviews a lot less problematic because the editor could delete comment where the anonymity seduced people into inappropriate behavior. Anonymity could be abused to make false attacks with impunity. On the other hand anonymity can also provide protection in case of large power differences in case of real problems.

The advantage of internet publishing is that there is no need for an editor to reject technically correct manuscripts. If the contribution to science is small or if the result is very speculative and quite likely to be found to be wrong in future, the manuscript can still be accepted but simply be given a corresponding grade.

This also points to a main disadvantage of the current dead-tree-inspired system: you get either a yes or a no. There is a bit more information in the journal the author chooses, but that is about it. A digital system can communicate much more subtly with a prospective reader. A speculative article is interesting for experts, but may be best avoided by outsiders until the issues are better understood. Some articles mainly review the state-of-the-art, others provide original research. Some articles have a specific audience: for example the users of a specific dataset or model. Some articles are expected to be more important for scientific progress than others or discuss issues that are more urgent than others. And so on. This information can be communicated to the reader.

The nice thing about the open annotate system is that we can begin reviewing articles before authors start submitting their articles. We can simply review existing articles as well as manuscripts, such as the ones uploaded to ArXiv. The editors could reject articles that should not have been published in the traditional journals and accept manuscripts from archives. I would judge this assessment of a knowledgeable editor (team) more than the acceptance by a traditional journal.

In this way we can produce collections of existing articles. If the new system provides a better reviewing service to science, the authors at some moment can stop submitting their manuscripts to traditional journals and submit them directly to the editors of a collection. Then we have real grassroots scientific journals that serve science.

For colleagues in the communities it would be clear which of these collections have credibility. However, for outsiders we would also need some system that communicates this, which would traditionally be the role of publishing houses and the high barriers to entry. This could be assessed where collections have overlap. Preferably again by humans and not by algorithms. For some articles there may be legitimate reasons why there are differences (hard to assess, other topic of collection), for other articles an editor not having noticed problems may be a sign of bad editorship. This problem is likely not too hard, in a recent analysis of twitter discussions on climate change there was a very clear distinction between science and nonsense.

There is still a lot to do, but with the ease of modern publishing and the open annotate system a lot of software is already there. Larger improvements would be tools for editors to moderate review comments (or at least to collapse less valuable comments); Hypothes.is is working on it. A grassroots journal would need a grading system; standardized when possible. More practical tools would include some help in tracking the manuscripts under review and for sending reminders, and the editors of one collection should be able to communicate with each other. The grassroots journal should remain visible even if the editor team stops; that will need collaboration with libraries or science societies.

If we get this working
  • we can say goodbye to frustrated reviewers (well mostly),
  • goodbye to pay-walled journals in which publicly financed research is hidden for the public and many scientists alike and
  • goodbye to wasting limited research money on monopolistic profits by publishing houses, while
  • we can welcoming better review and selection and
  • we are building a system that inherently allows for post-publication peer review.

What do you think?



Related reading

There is now an "arXiv overlay journal", Discrete Analysis. Articles are published/hosted by ArXiv, otherwise traditional peer review. The announcement mentions three software initiative that make starting a digital journal easy: Scholastica, Episciences.org and Open Journal Systems.

Annotating the scholarly web

A coalition to Annotating All Knowledge A new open layer is being created over all knowledge

Brian A. Nosek and Yoav Bar-Anan describe a scientific utopia: Scientific Utopia: I. Opening scientific communication. I hope the ideas in the above post makes this transition possible.

Climate Feedback has started a crowed funding campaign to be able to review more media articles on climate science

Farewell peer reviewing

7 Crazy Realities of Scientific Publishing (The Director's Cut!)

Mapped: The climate change conversation on Twitter

I would trust most scientists to use annotation responsibly, but it can also be used to harass vulnerable voices on the web. Genius Web Annotator vs. One Young Woman With a Blog. Hypothesis is discussing how to handle such situations.

Nature Chemistry blog: Post-publication peer review is a reality, so what should the rules be?

Report from the Knowledge Exchange event: Pathways to open scholarship gives an overview of the different initiative to make science more open.

Magnificent BBC Reith lecture: A question of trust

Sunday 1 May 2016

Christy and McNider: Time Series Construction of Summer Surface Temperatures for Alabama

John Christy and Richard McNider have a new paper in the AMS Journal of Applied Meteorology and Climatology called "Time Series Construction of Summer Surface Temperatures for Alabama, 1883–2014, and Comparisons with Tropospheric Temperature and Climate Model Simulations". Link: Christy and McNider (2016).

This post gives just few quick notes on the methodological aspects of the paper.
1. They select data with a weak climatic temperature trend.
2. They select data with a large cooling bias due to improvements in radiation protection of thermometers.
3. They developed a new homogenization method using an outdated design and did not test it.

Weak climatic trend

Christy and McNider wrote: "This is important because the tropospheric layer represents a region where responses to forcing (i.e., enhanced greenhouse concentrations) should be most easily detected relative to the natural background."

The trend in the troposphere should a few percent stronger than at the surface; mainly in the tropics. However, it is mainly interesting that they see a strong trend as a reason to prefer tropospheric temperatures, because when it comes to the surface they select the period and temperature with the smallest temperature trend: the daily maximum temperatures in summer.

The trend in winter due to global warming should be 1.5 times the trend in summer and the trend in the night time minimum temperatures is stronger than the trend in the day time maximum temperatures, as discussed here. Thus Christy and McNider select the data with the smallest trend for the surface. Using their reasoning for the tropospheric temperatures they should prefer night time winter temperatures.

(And their claim on the tropospheric temperatures is not right because whether a trend can be detected does not only depend on the signal, but also on the noise. The weather noise due to El Nino is much stronger in the troposphere and the instrumental uncertainties are also much larger. Thus the signal to noise ratio is smaller for the tropospheric temperatures, even if the signal were as long as the surface observations.

Furthermore, I am somewhat amused that there are still people interested in the question whether global warming can be detected.)

[UPDATE. Tamino shows that within the USA, Alabama happens to be the region with the least warming. The more so for the maximum temperature. The more so for the summer temperature.]

Cooling bias

Then they used data with a very large cooling bias due to improvements in the protection of the thermometer for (solar and infra-red) radiation. Early thermometers were not protected as well against solar radiation and typically record too high temperatures. Early thermometers also recorded too cool minimum temperatures; the thermometer should not see the cold sky, otherwise it radiates out to it and cools. The warming bias in the maximum temperature is larger than the cooling bias in the minimum temperature, thus the mean temperature still has some bias, but less than the maximum temperature.

Due to this reduction in the radiation error summer temperatures have a stronger cooling bias than winter temperatures.

The warming effect of early measurements on the annual means is probably about 0.2 to 0.3°C. In the maximum temperature is will be a lot higher and in the summer temperature it will again be a lot higher.

That is why most climatologists use the annual means. Homogenization can improve climate data, but it cannot remove all biases. Thus it is good to start with data that has least bias. Much better than starting with a highly biased dataset like Christy and McNider did.

Statistical homogenization removes biases by comparing a candidate station to its neighbour. The stations need to be close enough together so that the regional climate can be assumed to be similar in both stations. The difference between two stations is then weather noise and inhomogeneities (non-climatic changes due to changes in the way temperature was measured).

If you want to be able to see the inhomogeneities you thus need to have well correlated neighbors that have as little weather noise as possible. By using only the maximum temperature, rather than the mean temperature, you increase the weather noise. But using the monthly means in summer, rather than the annual means or at the very least the summer means, you increase the weather noise. By going back in time more than a century you increase the noise because we had less stations to compare with at the time.

They keyed part of the the data themselves mainly for the period before 1900 from the paper records. It sounds as if they performed no quality control of these values (to detect measurement errors). This will also increase the noise.

With such a low signal to noise ratio (inhomogeneities that are small relative to the weather noise in the difference time series), the estimated date of the breaks they still found will have a large uncertainty. It is thus a pity that they purposefully did not use information from station histories (metadata) to get the date of the breaks right.

Homogenization method

They developed their own homogenization method and only tested it on a noise signal with one break in the middle. Real series have multiple breaks; in the USA typically every 15 years. Furthermore also the reference series has breaks.

The method uses the detection equation from the Standard Normal Homogeneity Test (SNHT), but then starts using different significance levels. Furthermore for some reason it does not use the hierarchical splitting of SNHT to deal with multiple breaks, but it detects on a window, in which it is assumed there is only one break. However, if you select the window too long it will contain more than one break and if you select the window too short the method will have no detection power. You would thus theoretically expect the use of a window for detection to perform very badly and this is also what we found in a numerical validation study.

I see no real excuse not to use better homogenization methods (ACMANT, PRODIGE, HOMER, MASH, Craddock). These are build to take into account that also the reference station has breaks and that a series will have multiple breaks; no need for ad-hoc windows.

If you design your own homogenization method, it is good scientific practice to test it first, to study whether it does what you hope it does. There is, for example, the validation dataset of the COST Action HOME. Using that immediately allows you to compare your skill to the other methods. Given the outdated design principles, I am not hopeful the Christy and McNider homogenization method would score above average.

Conclusions

These are my first impressions on the homogenization method used. Unfortunately I do not have the time at the moment to comment on the non-methodological parts of the paper.

If there are no knowledgeable reviewers available in the USA, it would be nice if the AMS would ask European researchers, rather than some old professor who in the 1960s once removed an inhomogeneity from his dataset. Homogenization is a specialization, it is not trivial to make data better and it really would not hurt if the AMS would ask for expertise from Europe when American experts are busy.

Hitler is gone. The EGU general assembly has a session on homogenization, the AGU does not. The EMS has a session on homogenization, the AMS does not. EUMETNET organizes data management workshops, a large part of which is about homogenization; I do not know of an American equivalent. And we naturally have the Budapest seminars on homogenization and quality control. Not Budapest, Georgia, nor Budapest, Missouri, but Budapest, Hungary, Europe.



Related reading

Tamino: Cooling America. Alabama compared to the rest of contiguous USA.

HotWhopper discusses further aspects of this paper and some differences between the paper and the press release. Why nights can warm faster than days - Christy & McNider vs Davy 2016

Early global warming

Statistical homogenisation for dummies