Variable Variability: October 2017

Saturday, 7 October 2017

A short history of homogenisation of climate station data

The WMO Task Team on Homogenisation (TT-HOM) is working on a guidance for scientists and weather services who want to homogenise their data. I thought the draft chapter on the history of homogenisation doubles as a nice blog post. It is a pretty long history, starting well before people were worrying about climate change. Comments and other important historical references are very much appreciated.

Problems due to inhomogeneities have long been recognised and homogenisation has a long history. In September 1873, at the “International Meteorologen-Congress” in Vienna, Carl Jelinek requested information on national multi-annual data series ([[k.k.]] Hof- und Staatsdruckerei, 1873), but decades later, in 1905 G. Hellmann (k.k. Zentralanstalt für Meteorologie und Geodynamik, 1906) still regretted the absence of homogeneous climatological time series due to changes in the surrounding of stations and new instruments and pleaded for stations with a long record, “Säkularstationen”, to be kept as homogeneous as possible.

Although this “Conference of directors” of the national weather services recommended maintaining a sufficient number of stations under unchanged conditions today these basic inhomogeneity problems still exist.

Detection and adjustments

Homogenisation has a long tradition. For example, in early times documented change points have been removed with the help of parallel measurements. Differing observing times at the astronomical observatory of the k.k. University of in Vienna (Austria) have been adjusted by using multi-annual 24 hour measurements of the astronomical observatory of the k.k. University of Prague (today Czech Republic). Measurements of Milano (Italy) between 1763 and 1834 have been adjusted to 24 hour means by using measurements of Padova (Kreil, 1854a, 1854b).

However, for the majority of breaks we do not know the break magnitude; furthermore it is most likely that series contain undocumented inhomogeneities as well.Thus there was a need for statistical break detection methods. In the early 20th century Conrad (1925) applied and evaluated the Heidke criterion (Heidke, 1923) using ratios of two precipitation series. As a consequence he recommended the use of additional criteria to test the homogeneity of series, dealing with the succession and alternation of algebraic signs, the Helmert criterion (Helmert, 1907) and the “painstaking” Abbe criterion (Conrad and Schreier, 1927). The use of Helmert’s criterion for pairs of stations and Abbe’s criterion still has been described as appropriate tool in the 1940s (Conrad 1944). Some years later the double-mass principle was popularised for break detection (Kohler, 1949).

German Climate Reference Station which was founded in 1781 in Bavaria on the mountain Hohenpeißenberg.

Reference series

Julius Hann (1880, p. 57) studied the variability of absolute precipitation amounts and ratios between stations. He used these ratios for the quality control. This inspired Brückner (1890) to check precipitation data for inhomogeneities by comparison with neighbouring stations;he did not use any statistics.

In their book “Methods in Climatology” Conrad and Pollak (1950) formalised this relative homogenization approach, which is now the dominant method to detect and remove the effects of artificial changes. The building of reference series, by averaging the data from many stations in a relatively small geographical area, has been recommended by the WMO Working Group on Climatic Fluctuations (WMO, 1966).

The papers by Alexandersson (1986) and Alexandersson and Moberg (1997) made the Standard Normal Homogeneity Test (SNHT) popular. The broad adoption of SNHT was also for the clear guidance on how to use this test together with references to homogenize station data.

Modern developments

SNHT is a single-breakpoint method, but climate series typically contain more than one break. Thus a major step forward was the design of methods specifically designed to detect and correct multiple change-points and work with inhomogeneous references (Szentimrey, 1999; Mestre, 1999; Caussinus and Mestre, 2004). These kind of methods were shown to be more accurate by the benchmarking study of the EU COST Action HOME (Venema et al., 2012).

The paper by Caussinus and Mestre (2004) also provided the first description of a method that jointly corrects all series of a network simultaneously. This joint correction method was able to improve the accuracy of all but one contribution to the HOME benchmark that was not yet using this approach (Domonkos et al., 2013).

The ongoing work to create appropriate datasets for climate variability and change studies promoted the continual development of better methods for change point detection and correction. To follow this process the Hungarian Meteorological Service started a series of “Seminars for Homogenization” in 1996 (HMS 1996, WMO 1999, OMSZ 2001, WMO 2004, WMO 2006, WMO 2010).

References

Brückner, E., 1890: Klimaschwankungen seit 1700 nebst Bemerkungen über Klimaschwankungen der Diluvialzeit. E.D. Hölzel, Wien and Olnütz.

Alexandersson, A., 1986: A homogeneity test applied to precipitation data. J. Climatol., 6, pp. 661-675.

Alexandersson, H. and A. Moberg, 1997: Homogenization of Swedish temperature data .1. Homogeneity test for linear trends. Int. J. Climatol., 17, pp. 25-34.

Caussinus, H. and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. Appl. Statist., 53, Part 3, pp. 405-425.

Conrad, V. and C. Pollak, 1950: Methods in Climatology. Harvard University Press, Cambridge, MA, 459 p.

Conrad V., O. Schreier, 1927: Die Anwendung des Abbe’schen Kriteriums auf physikalische Beobachtungsreihen. Gerland’s Beiträge zur Geophysik, XVII, 372.

Conrad, V., 1925: Homogenitätsbestimmung meteorologischer Beobachtungsreihen. Meteorologische Zeitschrift, 482–485.

Conrad V., 1944: Methods in Climatology. Harvard University Press, 228 p.

Domonkos, P., V. Venema, O. Mestre, 2013: Efficiencies of homogenisation methods: our present knowledge and its limitation. Proceedings of the Seventh seminar for homogenization and quality control in climatological databases, Budapest, Hungary, 24 – 28 October 2011, WMO report, Climate data and monitoring, WCDMP-No. 78, pp. 11-24.

Hann, J., 1880: Untersuchungen über die Regenverhältnisse von Österreich-Ungarn. II. Veränderlichkeit der Monats- und Jahresmengen. S.-B. Akad. Wiss. Wien.

Heidke P., 1923: Quantitative Begriffsbestimmung homogener Temperatur- und Niederschlagsreihen. Meteorologische Zeitschrift, 114-115.

Helmert F.R., 1907: Die Ausgleichrechnung nach der Methode der kleinsten Quadrate. 2. Auflage, Teubner Verlag.

Peterson T.C., D.R. Easterling, T.R. Karl, P. Groisman, N. Nicholls, N. Plummer, S. Torok, I. Auer, R. Boehm, D. Gullett, L. Vincent, R. Heino, H. Tuomenvirta, O. Mestre, T. Szentimrey, J. Salinger, E.J. Forland, I. Hanssen-Bauer, H. Alexandersson, P. Jones, D. Parker, 1998: Homogeneity adjustments of in situ atmospheric climate data: A review. Int. J. Climatol., 18, 1493-1517.

Hungarian Meteorological Service (HMS), 1996: Proceedings of the First Seminar for Homogenization of Surface Climatological Data, Budapest, Hungary, 6-12 October 1996, 44 p.

Kohler M.A., 1949: Double-mass analysis for testing the consistency of records and for making adjustments. Bull. Amer. Meteorol. Soc., 30: 188 – 189.

k.k. Hof- und Staatsdruckerei, 1873: Bericht über die Verhandlungen des internationalen Meteorologen-Congresses zu Wien, 2.-10. September 1873, Protokolle und Beilagen.

k.k. Zentralanstalt für Meterologie und Geodynamik. 1906: Bericht über die internationale meteorologische Direktorenkonferenz in Innsbruck, September 1905. Anhang zum Jahrbuch 1905. k.k. Hof-und Staatsdruckerei.

Kreil K., 1854a: Mehrjährige Beobachtungen in Wien vom Jahre 1775 bis 1850. Jahrbücher der k.k. Central-Anstalt für Meteorologie und Erdmagnetismus. I. Band – Jg 1848 und 1849, 35-74.

Kreil K., 1854b: Mehrjährige Beobachtungen in Mailand vom Jahre 1763 bis 1850. Jahrbücher der k.k. Central-Anstalt für Meteorologie und Erdmagnetismus. I. Band – Jg 1848 und 1849, 75-114.

Mestre O., 1999: Step-by-step procedures for choosing a model with change-points. In Proceedings of the second seminar for homogenisation of surface climatological data, Budapest, Hungary, WCDMP-No.41, WMO-TD No.962, 15-26.

OMSZ, 2001: Third Seminar for Homogenization and Quality Control in climatological Databases, Budapest.

Szentimrey, T., 1999: Multiple Analysis of Series for Homogenization (MASH). Proceedings of the second seminar for homogenization of surface climatological data, Budapest, Hungary; WMO, WCDMP-No. 41, 27-46.

Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams,
M.J. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso,
P. Esteban, Th. Brandsma. Benchmarking homogenization algorithms for monthly data. Climate of the Past, 8, pp. 89-115, doi: 10.5194/cp-8-89-2012, 2012. See also the introductory blog post and a post on the weaknesses of the study.

WMO, 1966: Climatic Change, Report of a working group of the Commission for Climatology. Technical Note 79, WMO – No. 195. TP.100, 79 p.

WMO 1999: Proceedings of the Second Seminar for Homogenization of Surface Climatological Data, Budapest, Hungary, 9 – 13 November 1998, 214 p.

WMO, 2004: Fourth Seminar for Homogenization and Quality Control in Climatological Databases, Budapest, Hungary, 6-10 October 2003, WCDMP-No 56, WMO-TD No. 1236, 243 p.

WMO, 2006: Proceedings of the Fifth Seminar for Homogenization and Quality Control in Climatological Databases, Budapest, Hungary, 29 May – 2 June 2006. Climate Data and Monitoring WCDMP- No 71, WMO/TD- No. 1493.

WMO, 2010: Proceedings of the Meeting of COST-ES0601 (HOME) Action, Management Committee and Working groups and Sixth Seminar for Homogenization and Quality Control in Climatological Databases, Budapest, Hungary, 26 – 30 May 2008, WMO reports on Climate Data and Monitoring, WCDMP-No. 76.

Sunday, 1 October 2017

The Earth sciences no longer need the publishers for publishing

Manuscript servers are buzzing around our ears, as the Dutch say.

In physics it is common to put manuscripts on the ArXiv server (pronounced: Archive server). A large part of these manuscripts are later send to a scientific journal for peer review following the traditional scientific quality control system and assessment of the importance of studies.

This speeds up the dissemination of scientific studies and can promote informal peer review before the formal peer review. Manuscripts do not have copyrights yet, so this also makes the research available to all without pay-walls. Expecting the manuscripts to be published on paper in a journal later, ArXiv is called a pre-print server. In these modern times I prefer manuscript server.

The manuscript gets a time stamp, a pre-print server can thus be used to claim precedence. Although the date of publication is traditionally used for this and there are no rules which date is most important. Pre-print servers can also give the manuscript a Digital Object Identifier (DOI) that can be used to cite it. A problem could be that some journals see a pre-print as prior publication, but I am not aware of any such journals in the atmospheric sciences, if you do please leave a comment below.

ArXiv has a section for atmospheric physics, where I also uploaded some manuscripts as a young clouds researcher. However because most meteorologists did not participate it could not perform the same function as it does in physics; I never got any feedback based on these manuscripts. When ArXiv made uploading manuscripts harder to get rid of submissions by retire engineers, I stopped and just put the manuscripts on my homepage.

Three manuscript archives

Maybe the culture will now change and more scientists participate with three new initiatives for manuscript servers for the Earth sciences. All three follow a different concept.

This August a digital archive started for Paleontology (paleorXiv, twitter). If I see it correctly they already have 33 manuscripts. (Only a part of them are climate related.) This archive builds on the open source preprint server of the Open Science Framework (OSF) of the non-profit Center for Open Science. The OSF is a platform for the entire scientific workflow from idea, to coding and collaboration to publishing. Also other groups are welcome to make a pre-print archive using their servers and software.

[UPDATE. Just announced that in November a new ArXiv will start: MarXiv, not for Marxists, but for the marine-conservation and marine-climate sciences.]

Two initiatives have just started for all of the Earth sciences. One grassroots initiative (EarthArXiv) and one by AGU/Wiley (ESSOAr).

EarthArXiv will also be based on the open source solution of the Open Science Framework. It is not up yet, but I presume it will look a lot like paleorXiv. It seems to catch on with about 600 twitter listeners and about 100 volunteers in just a few days. They are working on a logo (requirements, competition). Most logos show the globe; I would include the study of other planets in the Earth sciences.

The American Geophysical Union (AGU) has announced plans for an Earth and Space Science Open Archive (ESSOAr), which should be up and running early next year. They plan to be able to show a demo at the AGU's fall meeting in December.

The topic would thus be somewhat different due to the inclusion of space science and they will also permanently archive posters presented at conferences. That sounds really useful; now every conference designs their own solution and the posters and presentations are often lost after some time when the homepage goes down. EarthArXiv unfortunately seems to be against hosting posters. ESSOAr would also make it easy to transfer the manuscripts to (AGU?) journals.

A range of other academic societies are on the "advisory board" of ESSOAr, including EGU. ESSOAr will be based on proprietary software of the scientific publisher Wiley. Proprietary software is a problem for something that should function for as close to an eternity as possible. Not only Wiley, but also the AUG itself are major scientific publishers. They are not Elsevier, but this quickly leads to conflicts of interest. It would be better to have an independent initiative.

There need not be any conflict between the two "duelling" (according to Nature) servers. The manuscripts are open access and I presume they will have an API that makes it possible to mirror manuscripts of one server on the other. The editors could then remove the ones they do not see as fitting to their standards (or not waste their time). Beyond esoteric (WUWT & Co.) nonsense, I would prefer not to have much standards, that is the idea of a manuscript server.

Paul Voosen of Nature magazine wonders whether: "researchers working in more sensitive areas of the geosciences, such as climate science, will embrace posting their work prior to peer review." I see no problem there. There is nothing climate scientists can do to pacify the American culture war, we should thus do our job as well as possible and my impression is that climatology is easily in the better half of the Open Science movement.

I love to complain about it, but my impression is that sharing data is more common in the atmospheric sciences than average. This could well be because it is more important because data is needed from all over the world. The World meteorological Organization was one of the first global organizations set up to coordinate this. The European Geophysical Union (EGU) has open review journals for more than 15 years. The initial publication in a "discussion" journal is similar to putting your manuscript on a pre-print server. Many of the contributions to the upcoming FORCE2017 conference on Research Communication and e-Scholarship that mention a topic are about climate science.

The road to Open Access

A manuscript server is one step on the way to an Open Access publishing future. This would make articles better accessible to researchers and the public who paid for it.

Open Access would break the monopoly given to scientific publishers by copyright laws. An author looking for a journal to publish his work can compare price and service. But a reader typically needs to read one specific article and then has to deal with a publishers with monopoly power. This has led to monopolistic profits and commercial publishers that have lost touch with their customers, the scientific community. That Elsevier has a profit margin of "only" 36 percent thus seems to be mismanagement, it should be close to a 100 percent.

ArXiv shows that publishing a manuscripts costs less than a dollar per article. Software to support the peer review can be rented for 10 dollar per article (see also: Episciences.org and Open Journal Systems). Writing the article and reviewing it is done for free by the scientific community. Most editors are also scientists working for free, sometimes the editor in chief gets some secretarial support, some money for a student help. Typesetting by journals is highly annoying as they often add errors doing so. Typesetting is easily done by a scientist, especially using Latex, but also with a Word template. That scientists pay thousands of dollars per article is not related to the incurred costs, but due to monopoly brand power.

Publishers that serve the community, articles that everyone can read and less funding wasted on publishing is a desirable goal, but it is hard to get there because the barriers to entry are large. Scientists want to publish in journals with a good reputation and if the journals are not Open Access with a broad circulation. This makes starting a new journal hard, even if a new journal does a much better job at a much lower price, it will start with no reputation and without a reputation it will not get manuscripts to prove its worth.

To make it easier to get from the current situation to an Open Access future, I propose the concept of Grassroot Scientific Publishing. Starting a new journal should be as easy as starting a blog: Make an account, give the journal name and select a lay-out. Finished, start reviewing.

To overcome the problem that initially no one will submit manuscripts a grassroots journal can start with reviewing already published articles. This is not wasted time because we can do a much better job communicating the strength and weakness as well as the importance of an article than we do now, where the only information we have on the importance is the journal in which it is published. We can categorise and rank them. We can have all articles of one field in the same journal, no longer scattered around in many different journals.

Even without replacing traditional journals, such a grassroots journal would provide a valuable service to its scientific community.

To explain the idea and get feedback on how to make it better I have started a new grassroots publishing blog:

Grassroots scientific publishing (the general idea in more detail)
A system for the assessment of the importance of scientific papers (a proposal for a review system)
Participating in grassroots reviewing
Pros and cons of editors that are part of their scientific community
A community of community journals (how journals can collaborate to make the system stronger)

Once this kind of journals is established and has shown it provides superior quality assurance and information, there is no longer any need for pay-wall journals and we can just review the articles on manuscript servers.

Variable Variability

Pages