Sunday 21 August 2016

Naïve empiricism and what theory suggests about errors in observed global warming

In its time it was huge progress that Francis Bacon stressed the importance of observations. Even if he did not do that much science himself, his advocacy for the Baconian (scientific) method, gave him a place as one of the fathers of modern science together with Nicolaus Copernicus and Isaac Newton.

However, you can also become too fundamentalist about empiricism. Modern science is characterized by an intricate interplay of observations and theory. An observation is never free of theory. You may not be aware of it, but you make theoretical assumptions about what you see in any observation. Theory also guides what to observe, what kind of experiments to make.

[UPDATE. I finally found the Darwin quote I had wanted to use below. It is:
About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours. How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service! ]
Charles Darwin often claimed to adhere to Bacon's ideals, but he had another side. University of California professor of biology and philosophy Francisco Ayala writes in Darwin and the scientific method:
“Let theory guide your observations.” Indeed, Darwin had no use for the empiricist claim that a scientist should not have a preconception or hypothesis that would guide his work. Otherwise, as he wrote, one “might as well go into a gravel pit and count the pebbles and describe the colors. How odd it is that anyone should not see that observation must be for or against some view if it is to be of any service”
But his ambivalence is seen in Darwin's advice to a young scientist:
Let theory guide your observations, but till your reputation is well established be sparing in publishing theory. It makes persons doubt your observations.
The same ambivalence is seen in Einstein. Mitigation skeptics like this quote:
No amount of experimentation can ever prove me right; a single experiment can prove me wrong.
They quote this when the observations show less changes than the model. If the observations show more changes than the model/theory the observations, they quickly forget Einstein and the observations are suddenly wrong.

In practice Einstein was more realistic. Prof in molecular physics [[John Rigden]] wrote in his book about Einstein's wonder year 1905: "Einstein saw beyond common sense and, while he respected experimental data, he was not its slave."

That is perfectly reasonable. When theory and observations do not match, the theory can be wrong, the observations can be wrong and the comparison can be wrong. What is called observations is nearly always something that was computed from observations and also that computation can be imperfect. Only when we understand the reason, can we say what it was.

The main blog of the mitigation skeptical movement, WUWT, on the other hand is famous for calling trying to understand the reasons for discrepancies: "excuses".

Global mean temperature

That was a long introduction to get to the graph I wanted to show, where theory suggests the global mean temperature estimates in some periods may have problems.

The graph was computed by Andrew Poppick and colleagues[, now published in Advances in Statistical Climatology, Meteorology and Oceanography] and it looks as if the manuscript is not published yet. They model the temperature for the instrumental period based on the known human forcings — mainly increases in greenhouse gasses and aerosols (small airborne particles from combustion) — and natural forcings — volcanoes and solar variations. The blue line is the model, the grey line the temperature estimate from NASA GISS (GISTEMP).

The fit is astonishing. There are two periods, however, where the fit could be better: world war II and the first 40 to 50 years. So either the theory (this statistical model) is incomplete or the observations have problems.

It is expected that the observations in the WWII are more uncertain. Especially the sea surface temperature changes are hard to estimate because the type of ships and thus the type of observations changed radically in this period. The HadSST estimate of the measurement methods is shown below. During WWII American war ships dominated and they mainly used Engine Room Intake observations, whereas before and after the war merchant ship would often measure the temperature of a bucket of sea water.

The figure above are the observational methods estimated by the UK Hadley Centre for HadSST. Poppick's manuscript uses GISTEMP. Its sea surface temperature comes from ERSST v4. (The land data of GISTEMP comes from the stations gathered by NOAA (GHCNv3) and additional Antarctic stations).

ERSST estimates the observational methods of ships by comparing the sea surface temperature to the night marine air temperature (NMAT). This relationship is only stable over larger areas and multiple years. They can thus not follow the fast changes in the WWII observational methods well.

Also for HadSST it is not clear whether these corrections are accurate and they are large: in the order of 0.3°C. What makes this assessment more difficult is that in the beginning of WWII there was a strong and long [[El Nino event]]. Thus a bit of a peak is expected, but it is not clear whether the size is right.

I would not mind if a reviewer would request to add a statistical model that includes El Nino as predictor in Poppick's paper. That would reduce the noise further (part of the remaining noise is likely explained by El Nino) and that would make it easier to assess how well the temperature fits in the WWII.

The Southern Oscilation Index (SOI) of the Australian Bureaux of Meteorology (BOM). Zoomed in to show the period around WWII. Values below -7 indicate El Nino events and above +7 La Nina events.

It would be an important question to resolve. The peak in the WWII is a large part of the hiatus (a real one) we see in the period 1940 to 1980. If you think the peak in the 1940s away, this hiatus is a lot smaller. The lack of warming in this period is typically explained with increases in aerosols. It ended when air pollution regulations slowed the growth of aerosols; especially in the industrialised air quality improved a lot. I guess that if this peak is smaller, that would indicate that the influence of aerosols is smaller than we currently think.

While the observations hardly showed any warming the first 40 to 50 years, the statistical model suggests that there should have been some warming. The global climate models also suggest some warming. And also several other climate variables suggest warming: the warming in winter, the time lakes and rives freeze and break up, the retreat of glaciers, temperature reconstructions from proxies, and possibly sea level rise. See for example this graph of the dates rivers and lakes froze up and broke up.

I wrote about these changes in my previous post on "early global warming". Poppick's statistical model adds another piece of evidence and suggests that we should have a look whether we understand the measurement problems in the early data well enough.

By comparing the observations with the statistical model we can see periods in which the fit is bad. Whether the long-term observed trend is right cannot be seen this way because the statistical model would still fit well, just with a different coefficient for the long-term forcings. This relationship is likely biased in a similar way as the simple statistical models used to estimate the equilibrium climate sensitivity from observations. This model, and thus theory, does provide a beautiful sanity check on the quality of the observations and suggests periods which we may need to study better.

Related reading

Falsifiable and falsification in science

Early global warming

On the naive empirical view of Australian politician Malcolm Roberts on science: What Climate Change Skeptics Aren’t Getting About Science

Piers Sellers in The New Yorker: Space, Climate Change, and the Real Meaning of Theory

Cowtan, Kevin Douglas, Robert Rohde and Zeke Hausfather, 2017: Evaluating biases in Sea Surface Temperature records using coastal weather stations. Quarterly journal of the royal meteorological society. doi: 10.1002/qj.3235

Thompson, David W.J. , John J. Kennedy, John M. Wallace & Phil D. Jones, 2008:
A large discontinuity in the mid-twentieth century in observed global-mean surface temperature. Nature, 453, pages 646–649, doi: 10.1038/nature06982.


Andrew Poppick, Elisabeth J. Moyer, and Michael L. Stein, 2016: Estimating trends in the global mean temperature record. Unpublished manuscript. Now published in Advances in Statistical Climatology, Meteorology and Oceanography

* Portrait of Francis Bacon at the top is taken from Wikipedia and is in the public domain.


  1. The Einstein quote is, naturally, an inferior paraphrase. The original is better:

    "Eine Theorie kann also wohl als unrichtig erkannt werden, wenn in ihren Deduktionen ein logischer Fehler ist, oder also unzutreffend, wenn eine Tatsache mit einer ihrer Folgerungen nicht im Einklang ist. Niemals aber kann die Wahrheit einer Theorie erwiesen werden. Denn niemals weiss man, dass auch in Zukunft keine Erfahrung bekannt werden wird, die ihren Folgerungen wiederspricht; und stets sind noch andere Gedankensysteme denkbar, welche imstande sind, so gibt es kein anderen Kriterium fur die Bevorzugung der einen oder der anderen als den intuitiven Bild des Forschers."

    or in the standard English translation

    "Thus, a theory can very well be found to be incorrect if there is a logical error in its deduction, or found to be off the mark if a fact is not in consonance with one of its conclusions. But the truth of a theory can never be proven. For one never knows if future experience will contradict its conclusion; and furthermore there are always other conceptual systems imaginable which might coordinate the very same facts. When two theories are available and both are compatible with the given arsenal of facts, then there are no other criteria to prefer one over the other besides the intuitive eye of the researcher. In this manner one can understand why sagacious scientists, cognizant of both--theories and facts--can still be passionate adherents of opposing theories."

  2. :) Sounds good, but sure that is an inaccurate paraphrase, not another quote?

  3. Wiki thinks its not a quote; see Notice that the page you linked to rather coyly calls it a "paraphrase".

  4. Can't PROVE it of course :) but there's no attributed first-hand source anywhere I could find for the quote you gave, and my prior says that most unattributed Einstein quotes are at best paraphrased. It also doesn't sound much like Einstein.

    One other possible source (found by the wikiquotes:talk page, but they had the wrong source):

    'The theoretical scientific researcher is not to be envied, because Nature—or more precisely put: experiment—is a merciless and not very kindly judge of his efforts. She never says “yes” to a theory, in the best case merely “perhaps”; but in most cases simply “no.” If an experiment agrees with the theory, it means “perhaps”; if it does not agree, then it means “no.” Every theory is sure to experience its “no” someday, most theories already do so soon after their formulation.'

  5. Nice to see more emphasis on the quality of early historical data. We almost never get new data from the past so better understanding of their biases is vital.

    In that vein, can I throw an idea at you, to see if it's already incorporated into the bias adjustment process?

    During WW2, data types shifted drastically from buckets on merchant ships to engine intake on warships, as you point out in this post. Might there be an additional effect due to the wartime conditions... the average speed of both merchant ships and warships would be significantly higher in wartime in order to avoid submarines, reach important objectives more quickly, and for many reasons of wartime exigencies.

    Faster average speed might change the water level from which engine intakes draw their water (if shallower, then warmer), might heat up the water as it is drawn in. Warships might have made longer than average patrols, at higher speeds, which consumed more fuel, thus lower than average draft (shallower engine intake depth again) compared to peacetime, or there may be some other effects I'm not thinking of.

    The crew of merchant ships pulling buckets up on deck might be operating with more urgency under wartime stress, and pull the bucket up faster, so less time to cool by evaporation.

    To test this, someone who is well versed in historical temperature data homogenization might be able to examine ships logs to correlate ship speed, patrol duration, etc against the temperature recorded on those warship patrols and merchant ship cruises.

    Do you think this is a reasonable hypothesis? Do you know anyone who might be interested?

  6. William Beeson, unfortunately not that many people work on climate data quality. Especially the SST community is tiny, much too small.

    The ship in WWII were actually going slower. This is mainly due to more ships with a very low speed. Maybe traveling in convoys or trying to save fuel made it slower?

    (The above link goes to Figure 11 of A probabilistic approach to ship voyage reconstruction in ICOADS from researchers from Southampton, UK.)

    I work in the land temperature and do not know the SST literature that well, but as far as I know the engine room intake corrections do not depend on the ship speed.

  7. I suspected someone had already thought about these ideas. It's hard to come up with anything truly novel.

    Good info on the average ship speed. I should have realized that convoys would lower average speed. I already knew that, in fact, but didn't apply that knowledge to this problem.

    I'm still keen on looking at the lowering of average draft for warships, as they take longer patrols. Also, the US Navy producing and deploying huge numbers of ships in the Pacific must be a big part of the sudden jump to a high percentage of engine room intake temperature measurements. Since the Pacific is so large, many of these ships must have been operating at their maximum range more often, and crossing areas with few observations to cross check against. This would introduce a bias that is harder to eliminate.

    The convoy system in the Atlantic would certainly lower average speeds there, but the Japanese spent little effort going after merchant and supply convoys. So in the Pacific, with warships being hunted, they might have a higher average speed in the contested parts of the Pacific (thinking about the tragedy of the USS Indianapolis sunk late in the war).

    If the geographic location of the spike in SST for the 1940-45 period corresponds to the vast stretches of the Pacific where US Navy ships were cruising around at higher than average speeds, with emptier than normal fuel bunkers, then lower average draft might still be a viable hypothesis.

    I'll take a peek around the HADSST pages to see if there are any graphs/maps to support this.

  8. So if Einstein is best read in German, how about a similar sentiment, in English:

    It is also a good rule not to put overmuch confidence in the observational results that are put forward until they are confirmed by theory.

    That's Arthur Eddington.

  9. As a side note, after years of chasing down Einstein quotes I have a simple working rule. If it's popular, simplistic, and no one has yet located a *specific* source, then it's a paraphrase or misattribution. As Einstein himself noted, "I never said half the crap they say I did on the Internet."

  10. Good post,
    SST data before 1950, or so, doesn't seem very reliable, especially the peak of 1944 and trough of 1910. These events cannot be found in met station data, not even when data is masked to ocean areas only, e g here:

    There are other problems when comparing global temperature datasets and models. The former use SAT over land and SST in ocean areas, whereas the latter use SAT everywhere. SST warms slower than SAT over oceans, at least in the models, so this comparison is not apples to apples.

    I have become more and more fond in the Gistemp dTs dataset, the only one that tries to estimate the global SAT based on met stations only. The disadvantage is of course that the extrapolation from coastal and island met stations out over the oceans is stretched very thin, and that the warming trend thus might be exaggerated. However, I believe that the trends of dTs is closer to that of the true global SAT, compared to the standard datasets that use SST to infer ocean SAT.

    Here I have made a comparison between Gistemp dTs and the CCSM4 ensemble (which has a larger than average climate sensitivity). There is a good fit through the whole period, the dTs index behaves just like an ensemble member, with few excursions larger than 0.2 C from the ensemble mean.

    Also, if "preindustrial" is defined as the average for 1880-1899, and the global temperature is defined as the Gistemp dTs index, we have already passed 1.5 C warming, and 2 C warming will be reached somewhere between 2025 and 2038, if dTs stay within the CCSM4 bounds (ensemble mean+/- 0.2 C) in the future as well..

  11. Sorry for the slow reply to you question VV, I have been occupied with other matters recently.

    Einstein when confronted with a possible refutation of his theory claimed he would abandon it. But when the Miller experiment in ~1926 seemed to show some ether drift he said this.

    "If the results of the Miller experiments were to be confirmed, then relativity theory could not be maintained, since the experiments would then prove that, relative to the coordinate systems of the appropriate state of motion (the Earth), the velocity of light in a vacuum would depend upon the direction of motion. With this, the principle of the constancy of the velocity of light, which forms one of the two foundation pillars on which the theory is based, would be refuted. There is, however, in my opinion, practically no likelihood that Mr. Miller is right".

    While the last sentence may sound rather arrogant, as if he had more confidence in his theory than the experimental data, he was aware of problems with the experimental method and expected improvements in the stability of the apparatus to temperature to provide 'better' results.

  12. Izen, thanks for finding the quote. The last sentence of Einstein is a little stronger than I would make it, but the sentiment is right.

    Aspects that is also important are that the experimental data also did not support an other theory, that relativity is a strong theory in that it is either right or wrong, but cannot be half right and at the time (1921) there was already other observational evidence that supported relativity.


Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.