Showing posts with label GHCN. Show all posts
Showing posts with label GHCN. Show all posts

Friday, May 29, 2020

What does statistical homogenization tell us about the underestimated global warming over land?

Climate station data contains inhomogeneities, which are detected and corrected by comparing a candidate station to its neighbouring reference stations. The most important inhomogeneities are the ones that lead to errors in the station network-wide trends and in global trend estimates. 

An earlier post in this series argued that statistical homogenization will tend to under-correct errors in the network-wide trends in the raw data. Simply put: that some of the trend error will remain. The catalyst for this series is the new finding that when the signal to noise ratio is too low, homogenization methods will have large errors in the positions of the jumps/breaks. For much of the earlier data and for networks in poorer countries this probably means that any trend errors will be seriously under-corrected, if they are corrected at all.

The questions for this post are: 1) What do the corrections in global temperature datasets do to the global trend and 2) What can we learn from these adjustments for global warming estimates?

The global warming trend estimate

In the global temperature station datasets statistical homogenization leads to larger warming estimates. So as we tend to underestimate how much correction is needed, this suggests that the Earth warmed up more than current estimates indicate.

Below is the warming estimate in NOAA’s Global Historical Climate Network (Versions 3 and 4) from Menne et al. (2018). You see the warming in the “raw data” (before homogenization; striped lines) and in the homogenized data (drawn line). The new version 4 is drawn in black, the previous version 3 in red. For both versions homogenization makes the estimated warming larger.

After homogenization the warming estimates of the two versions are quite similar. The difference is in the raw data. Version 4 is based on the raw data of the International Surface Temperature Initiative and has much more stations. Version 3 had many stations that report automatically, these are typically professional stations and a considerable part of them are at airports. One reason the raw data may show less warming in Version 3 is that many stations at airports were in cities before. Taking them out of the urban heat island and often also improving the local siting of the station, may have produced a systematic artificial cooling in the raw observations.

Version 4 has more stations and thus a higher signal to noise ratio. One may thus expect it to show more warming. That this is not the case is a first hint that the situation is not that simple, as explained at the end of this post.


Figure from Menne et al. with warming estimates from 1880. See caption below.
The global land warming estimates based on the Global Historical Climate Network dataset of NOAA. The red lines are for version 3, the black lines for the new version 4. The striped lines are before homogenization and the drawn lines after homogenization. Figure from Menne et al. (2018).

The difference due to homogenization in the global warming estimates is shown in the figure below, also from Menne et al. (2018). The study also added an estimate for the data of the Berkeley Earth initiative.

(Background information. Berkeley Earth started as a US Culture War initiative where non-climatologists computed the observed global warming. Before the results were in, climate “sceptics” claimed their methods were the best and they would accept any outcome. The moment the results turned out to be scientifically correct, but not politically correct, the climate “sceptics” dropped them like a hot potato.)

We can read from the figure that in GHCNv3 over the full period homogenization increases warming estimates by about 0.3 °C per century, while this is 0.2°C in GHCNv4 and 0.1°C in the dataset of Berkeley Earth datasets. GHCNv3 has more than 7000 stations (Lawrimore et al., 2011). GHCNv4 is based on the ISTI dataset (Thorne et al., 2011), which has about 32,000 stations, but GHCN only uses those of at least 10 years and thus contains about 26,000 stations (Menne et al. 2018). Berkeley Earth is based on 35,000 stations (Rohde et al., 2013).


Figure from Menne et al. (2018) showing how much adjustments were made.
The difference due to homogenization in the global warming estimates (Menne et al., 2018). The red line is for smaller GHCNv3 dataset, the black line for GHCNv4 and the blue line for Berkeley Earth.

What does this mean for global warming estimates?

So, what can we learn from these adjustments for global warming estimates? At the moment, I am afraid, not yet a whole lot. However, the sign is quite likely right. If we could do a perfect homogenization, I expect that this would make the warming estimates larger. But to estimate how large the correction should have been based on the corrections which were actually made in the above datasets is difficult.

In the beginning, I was thinking: if the signal to noise ratio in some network is too low, we may be able to estimate that in such a case we under-correct, say, 50% and then make the adjustments unbiased by making them, say, twice as large.

However, especially doing this globally is a huge leap of faith.

The first assumption this would make is that the trend bias in data sparse regions and periods is the same as that of data rich regions and periods. However, the regions with high station density are in the [[mid-latitudes]] where atmospheric measurements are relatively easy. The data sparse periods are also the periods in which large changes in the instrumentation were made as we were still learning how to make good meteorological observations. So we cannot reliably extrapolate from data rich regions and periods to data sparse regions and periods. 

Furthermore, there will not be one correction factor to account for under-correction because the signal to noise ratio is different everywhere. Maybe America is only under-corrected by 10% and needs just a little nudge to make the trend correction unbiased. However, homogenization adjustments in data sparse regions may only be able to correct such a small part of the trend bias that correcting for the under-correction becomes adventurous or even will make trend estimates more uncertain. So we would at least need to make such computations for many regions and periods.

Finally, another reason not to take such an estimate too seriously are the spatial and temporal characteristics of the bias. The signal to noise ratio is not the only problem. One would expect that it also matters how the network-wide trend bias is distributed over the network. In case of relocations of city stations to airports, a small number of stations will have a large jump. Such a large jump is relatively easy to detect, especially as its neighbouring stations will mostly be unaffected.

Already a harder case is the time of observation bias in America, where a large part of the stations has experienced a cooling shift from afternoon to morning measurements over many decades. Here, in most cases the neighbouring stations were not affected around the same time, but the smaller shift makes it harder to detect these breaks.

(NOAA has a special correction for this problem, but when it is turned off statistical homogenization still finds the same network-wide trend. So for this kind of bias the network density in America is apparently sufficient.)

Among the hardest case are changes in the instrumentation. For example, the introduction of Automatic Weather Stations in the last decades or the introduction of the Stevenson screen a century ago. These relatively small breaks often happen over a period of only a few decades, if not years, which means that also the neighbouring stations are affected. That makes it hard to detect them in a difference time series.

Studying from the data how the biases are distributed is hard. One could study this by homogenizing the data and studying the breaks, but the ones which are difficult to detect will then be under-represented. This is a tough problem; please leave suggestions in the comments.

Because of how the biases are distributed it is perfectly possible that the trend biases corrected in GHCN and Berkley Earth are due to the easy-to-correct problems, such as the relocations to airports, while the hard ones, such as the transition to Stevenson screens, are hardly corrected. In this case, the correction that could be made, do not provide information on the ones that could not be made. They have different causes and different difficulties.

So if we had a network where the signal to noise ratio is around one, we could not say that the under-correction is, say, 50%. One would have to specify for which kind of distribution of the bias this is valid.

GHCNv3, GHCNv4 and Berkeley Earth

Coming back to the trend estimates of GHCN version 3 and version 4. One may have expected that version 4 is able to better correct trend biases, having more stations, and should thus show a larger trend than version 3. This would go even more so for Berkeley Earth. But the final trend estimates are quite similar. Similarly in the most data rich period after the second world war, the least corrections are made.

The datasets with the largest number of stations showing the strongest trend would have been a reasonable expectation if the trend estimates of the raw data would have been similar. But these raw data trends are the reason for the differences in the size of the corrections, while the trend estimates based on the homogenized are quite similar.

Many additional stations will be in regions and periods where we already had many stations and where the station density was no problem. On the other hand, adding some stations to data sparse regions may not be sufficient to fix the low signal to noise ratio. So the most improvements would be expected for the moderate cases where the signal to noise ratio is around one. Until we have global estimates of the signal to noise ratio for these datasets, we do not know for which percentage of stations this is relevant, but this could be relatively small.

The arguments of the previous section are also applicable here; the relationship between station density and adjustments may not be that easy. Especially that the corrections in the period after the second world war are so small is suspicious; we know quite a lot happened to the measurement networks. Maybe these effects all average out, but that would be quite a coincidence. Another possibility is that these changes in observational methods were made over relatively short periods to entire networks making it hard to correct them.

A reason for the similar outcomes for the homogenized data could be that all datasets successfully correct for trend biases due to problems like the transition to airports, while for every dataset the signal to noise ratio is not enough to correct problems like the transition to Stevenson screens. GHNCv4 and Berkeley Earth using as many stations as they could find could well have more stations which are currently badly sited than GHCNv3, which was more selective. In that case the smaller effective corrections of these two datasets would be due to compensating errors.

Finally, as small disclaimer: The main change from version 3 to 4 was the number of stations, but there were other small changes, so it is not just a comparison of two datasets where only the signal to noise ratio is different. Such a pure comparison still needs to be made. The homogenization methods of GHCN and Berkeley Earth are even more different.

My apologies for all the maybe's and could be's, but this is something that is more complicated than it may look and I would not be surprised if it will turn out to be impossible to estimate how much corrections are needed based on the corrections that are made by homogenization algorithms. The only thing I am confident about is that homogenization improves trend estimates, but I am not confident about how much it improves.

Parallel measurements

Another way to study these biases in the warming estimates is to go into the books and study station histories in 200 plus countries. This is basically how sea surface temperature records are homogenized. To do this for land stations is a much larger project due to the large number of countries and languages.

Still there are such experiments, which give a first estimate for some of the biases when it comes to the global mean temperature (do not expect regional detail). In the next post I will try to estimate the missing warming this way. We do not have much data from such experiments yet, but I expect that this will be the future.

Other posts in this series






References

Chimani, Barbara, Victor Venema, Annermarie Lexer, Konrad Andre, Ingeborg Auer and Johanna Nemec, 2018: Inter-comparison of methods to homogenize daily relative humidity. International Journal Climatology, 38, pp. 3106–3122. https://doi.org/10.1002/joc.5488

Gubler, Stefanie, Stefan Hunziker, Michael Begert, Mischa Croci-Maspoli, Thomas Konzelmann, Stefan Brönnimann, Cornelia Schwierz, Clara Oria and Gabriela Rosas, 2017: The influence of station density on climate data homogenization. International Journal of Climatology, 37, pp. 4670–4683. https://doi.org/10.1002/joc.5114

Lawrimore, Jay H., Matthew J. Menne, Byron E. Gleason, Claude N. Williams, David B. Wuertz, Russel S. Vose and Jared Rennie, 2011: An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. Journal of Geophysical Research, 116, D19121. https://doi.org/10.1029/2011JD016187

Lindau, Ralf and Victor Venema, 2018: On the reduction of trend errors by the ANOVA joint correction scheme used in homogenization of climate station records. International Journal of Climatology, 38, pp. 5255– 5271. Manuscript: https://eartharxiv.org/r57vf/ Article: https://doi.org/10.1002/joc.5728

Rohde, Robert, Richard A. Muller, Robert Jacobsen, Elizabeth Muller, Saul Perlmutter, Arthur Rosenfeld, Jonathan Wurtele, Donald Groom and Charlotte Wickham, 2013: A New Estimate of the Average Earth Surface Land Temperature Spanning 1753 to 2011. Geoinformatics & Geostatistics: An Overview, 1, no.1. https://doi.org/10.4172/2327-4581.1000101

Sutton, Rowan, Buwen Dong and Jonathan Gregory, 2007: Land/sea warming ratio in response to climate change: IPCC AR4 model results and comparison with observations. Geophysical Research Letters, 34, L02701. https://doi.org/10.1029/2006GL028164

Thorne, Peter W., Kate M. Willett, Rob J. Allan, Stephan Bojinski, John R. Christy, Nigel Fox, Simon Gilbert, Ian Jolliffe, John J. Kennedy, Elizabeth Kent, Albert Klein Tank, Jay Lawrimore, David E. Parker, Nick Rayner, Adrian Simmons, Lianchun Song, Peter A. Stott and Blair Trewin, 2011: Guiding the creation of a comprehensive surface temperature resource for twenty-first century climate science. Bulletin American Meteorological Society, 92, ES40–ES47. https://doi.org/10.1175/2011BAMS3124.1

Wallace, Craig and Manoj Joshi, 2018: Comparison of land–ocean warming ratios in updated observed records and CMIP5 climate models. Environmental Research Letters, 13, no. 114011. https://doi.org/10.1088/1748-9326/aae46f 

Williams, Claude, Matthew Menne and Peter Thorne, 2012: Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. Journal Geophysical Research, 117, D05116. https://doi.org/10.1029/2011JD016761


Monday, January 30, 2017

With some programing skills you can compute global mean temperatures yourself

This is a guest post by citizen scientist Ron Roeland (not his real name, but I like alliteration for some reason). Being an actually sceptical person, he decided to compute the global mean land temperature from station observations himself. He could reproduce the results of the main scientific groups that compute this signal and, new for me, while studying the data noticed how important the relocation of temperature stations to airports is for the NOAA GHCNv3 dataset. (The headers in the post are mine.)

This post does not pretend to present a rigorous analysis of the global temperature record; instead, it intends to show how easy it is for someone with basic programming/math skills to debunk claims that NASA and NOAA have manipulated temperature data to produce their global-average temperature results, i.e. claims like these:

From C3 Headlines: By utilizing questionable adjustments based on even more questionable assumptions, NOAA managed to produce an entirely fabricated increase in the global warming trend from 1998 to 2012.

From a blogger on the Hill: There’s going to have to be a massive effort to pick apart failing climate models and questionably-adjusted data.

From Climate Depot: Over the past decade, NASA and NOAA have continuously altered the temperature record to cool the past and warm the present. Their claims are straight out Orwell's 1984, and have nothing to do with science'

The routine

Some time ago, after reading all kinds of claims (like the ones above) about how NASA and NOAA had improperly adjusted temperature data to produce their global-average temperature results, I decided to take a crack at the data myself.

I coded up a straightforward baselining/gridding/averaging routine that is quite simple and “dumbed down” in comparison to the NASA and NOAA algorithms. Below is a complete description of the algorithm I coded up.
  1. Using GHCN v3 monthly-average data, compute 1951-1980 monthly baseline temperatures for all GHCN stations. If a station has 15 or more valid temperatures in any given month for the 1951-1980 baseline period, retain that monthly baseline value; otherwise drop that station/month from the computations. Stations with no valid monthly baseline periods are completely excluded from the computations.
  2. For all stations and months where valid baseline temperature estimates were computed per (1) above, subtract the respective baseline temperatures from all of the station monthly temperature temperatures to produce monthly temperature anomalies for the years 1880-2015.
  3. Set up a global gridding scheme to perform area-weighting. To keep things really simple, and to minimize the number of empty grid-cells, I selected large grid-cell sizes (20 degrees x 20 degrees at the Equator). I also opted to recalculate the grid-cell latitude dimensions as one goes north/south of the equator in order to keep the grid-cell areas as nearly constant as possible. I did this to keep the grid-cell areas from shrinking (per the latitude cosines) in order to minimize the number of empty grid cells.
  4. In each grid-cell, compute the average (over all stations in the grid-cell) of the monthly temperature anomalies to produce a single time-series of average temperature anomalies for each month (years 1880 through 2015).
  5. Compute global average monthly temperature anomalies by averaging together all the grid-cell monthly average anomalies, weighted by the grid-cell areas (again, for years 1880 through 2015).
  6. Compute global-average annual anomalies for years 1880 through 2015 by averaging together the global monthly anomalies for each year.
The algorithm does not involve any station data adjustments (obviously!) or temperature interpolation operations. It’s a pretty basic number-crunching procedure that uses straightforward math plus a wee bit of trigonometry (for computing latitude/longitude grid-cell areas).

For me, the most complicated part of the algorithm implementation was managing the variable data record lengths and data gaps (monthly and annual) in the station data -- basically, the “data housekeeping” stuff. Fortunately, modern development libraries such as the C++ Standard Template Library make this less of a chore than it used to be.

Why this routine?

People unfamiliar with global temperature computational methods sometimes ask: “Why not simply average the temperature station data to compute global-average estimates? Why bother with the baselining and gridding described above?”

We could get away with straight averaging of the temperature data if it were not for the two problems described below.

Problem 1: Temperature stations have varying record lengths. The majority of stations do not have continuous data records that go all the way back to 1880 (the beginning of the NASA/GISS global temperature calculations). Even stations with data going back to 1880 have gaps in their records -- there are missing months or even years.

Problem 2: Temperature stations are not evenly distributed over the Earth’s surface. Some regions, like the continental USA and western Europe, have very dense networks of stations. Other regions, like the African continent, have very sparse station networks.

As a result of problem 1, we have a mix of temperature stations that changes from year to year. If we were simply to average the absolute temperature data from all those stations, the final global-average results would be significantly skewed from year to year due to the changing mix of stations from one year to the next.

Fortunately, the solution for this complication is quite straightforward: the baselining and anomaly-averaging procedure described above. For those who already familiar with this procedure, please bear with me while I illustrate how it works with a simple scenario constructed from simulated data.

Let’s consider a very simple scenario where the full 1880-2016 temperature history for a particular region is contained in data reported by two temperature stations, one of which is located on a hilltop and the other located on a nearby valley floor. The hilltop and valley floor locations have identical long-term temperature trends, but the hilltop location is consistently about 1 degree C cooler than the valley floor location. The hilltop temperature station has a temperature record starting in 1880 and ending in 1990. The valley floor station has a temperature record beginning in 1930 and ending in 2016.

Figure 1 below shows the simulated temperature time-series for these two hypothetical stations. Both time-series were constructed by superimposing random noise on the same linear trend, with the valley-floor station time-series having a constant offset temperature 1 degree C more than that of the hilltop station time-series. The simulated time-series for the hilltop station (red) begins in 1880 and continues to 1990. The simulated valley floor station temperature (blue) data begins in 1930 and runs to 2016. As can be seen during their period of overlap (1930-1990), the simulated valley-floor temperature data runs about 1 degree warmer than the simulated hilltop temperature data.


Figure 1: Simulated Hilltop Station Data (red) and Valley Floor Station Data (blue)

If we were to attempt to construct a complete 1880-2016 temperature history for this region by computing a straight average of the hilltop and valley floor data, we would obtain the results seen in Figure 2 below.


Figure 2: Straight Average of Valley Floor Station Data and Hilltop Station Data

The effects of the changing mix of stations (hilltop vs. valley floor) on the average temperature results can clearly be seen in Figure 2. A large temperature jump is seen at 1930, where the warmer valley floor data begins, and a second temperature jump is seen at 1990 where the cooler hilltop data ends. These temperature jumps obviously do not represent actual temperature increases for that particular region; instead, they are artifacts introduced by the changes in the mix of stations in 1930 and 1990.

An accurate reconstruction of the regional temperature history computed from these two temperature time-series obviously should show the warming trend seen in the hilltop and valley floor data over the entire 1880-2016 time period. That is clearly not the case here. Much of the apparent warming seen in Figure 2 is a consequence of the changing mix of stations.

Now, let’s modify the processing a bit by subtracting the (standard NASA/GISS) 1951-1980 hilltop baseline average temperature from the hilltop temperature data and the 1951-1980 valley floor baseline average temperature from the valley floor temperature data. This procedure produces the temperature anomalies for the hilltop and valley floor stations. Then for each year, compute the average of the station anomalies for the 1880-2016 time period.

This is the baselining and anomaly-averaging procedure that is used by NASA/GISS, NOAA, and other organizations to produce their global-average temperature results.

When this baselining and anomaly-averaging procedure is applied to the simulated temperature station data, it produces the results that can be viewed in figure 3 below.


Figure 3: Average of Valley Floor Station Anomalies and Hilltop Station Anomalies

In Figure 3, the temperature jumps associated with the beginning of the valley floor data record and the end of the hilltop data record have been removed, clearly revealing the underlying temperature trend shared by the two temperature time-series.

Also note that although neither of my simulated temperature stations have a full 1880-2016 temperature record, we were still able to compute a complete reconstruction for the 1880-2016 time period because there was enough overlap between the station records to allow us to “align” them via baselining.

The second problem, the non-uniform distribution of temperature stations, can clearly be seen in Figure 4 below. That figure shows all GHCNv3 temperature stations that have data records beginning in 1900 or earlier and continuing to the present time.


Figure 4: Long-Record GHCN Station Distribution

As one can see, the stations are highly concentrated in the continental USA and western Europe; Africa and South America, in contrast, have very sparse coverage. A straight unweighted average of the data from all the stations shown in the above image would result in temperature changes in the continental USA and western Europe “swamping out” temperature changes in South America and Africa in the final global average calculations.

That is the problem that gridding solves. The averaging procedure using grid-cells is performed in two steps. First, the temperature time-series for all stations in each grid-cell are averaged together to produce a single time-series per grid-cell. Then all the grid-cell time-series are averaged together to construct the final global-average temperature results (note: in the final average, the grid-cell time-series are weighted according to the size of each grid-cell). This eliminates the problem where areas on the Earth with very dense networks of stations are over-weighted in the global average relative to areas where the station coverage is more sparse.

Now, some have argued that the sparse coverage of certain regions of the Earth invalidate the global-average temperature computations. But it turns out that the NASA/GISS warming trend can be confirmed even with a very sparse sampling of the Earth’s surface temperatures. (In fact, the NASA/GISS warming trend can be replicated very closely with data from as few as 30 temperature stations scattered around the world.)

Real-world results

Now that we are done with the preliminaries, let’s look at some real-world results. Let’s start off by taking a look at how my simple “dumbed-down” gridding/averaging algorithm compares with the NASA/GISS algorithm when it is used to process the same GHCNv3 adjusted data that NASA/GISS uses. To see how my algorithm compares with the NASA/GISS algorithm, take a look at Figure 5 below, where the output of my algorithm is plotted directly against the NASA/GISS “Global Mean Estimates based on Land Data only” results.

(Note: All references to NASA/GISS global temperature results in this post refer specifically to the NASA/GISS “Global Mean Estimates based on Land Data only” results. Those results can be viewed on the NASA/GISS web-site; scroll down to view the “Global Mean Estimates based on Land Data only” graph).


Figure 5: Adjusted Data, All Stations: My Simple Gridding/Averaging (blue) vs. NASA/GISS (red)

In spite of the rudimentary nature of my algorithm, my algorithm produces results that match the NASA/GISS results quite closely. According to the R-squared statistic I calculated (seen in the upper-left corner of Figure 5), I got 98% of the NASA/GISS answer with a only tiny fraction of the effort!

But what happens when we use unadjusted GHCNv3 data? Well, let’s go ahead and compare the output of my algorithm with the NASA/GISS algorithm when my algorithm is used to process the unadjusted GHCNv3 data. Figure 6 below shows a plot of my unadjusted global temperature results vs. the NASA/GISS results (remember that NASA/GISS uses adjusted GHCNv3 data).


Figure 6: Unadjusted Data, All Stations: My Simple Gridding /Averaging (green) vs. NASA/GISS (red)

My “all stations” unadjusted data results show a warming trend that lines up very closely with the NASA/GISS warming trend from 1960 to 2016, with my results as well as the NASA/GISS results showing record high temperatures for 2016. However, my results do show a visible warm-bias relative to the NASA/GISS results prior to 1950 or so. This is the basis of the accusations that NOAA and NASA “cooled the past (and warmed the present)” to exaggerate the global warming trend.

Now, why do my unadjusted data results show that pre-1950 “warm bias” relative to the NASA/GISS results? Well, this excerpt from NOAA’s GHCN FAQ provides some clues:
Why are there more cold (negative) step changes than warm(positive) step changes in the historical land surface air temperature records represented in the GHCN v3 dataset?

The reason for the larger number of cold step changes is not completely clear, but they may be due in part to systematic changes in station locations from city centers to cooler airport locations that occurred in many parts of the world from the 1930s to through the 1960s.
Because the GHCNv3 metadata contains an airport designator field for every temperature station, it was quite easy for me to modify my program to exclude all the “airport” stations from the computations. So let’s exclude all of the “airport” station data and see what we get. Figure 7 below shows my unadjusted data results vs. the NASA/GISS results when all “airport” stations are excluded from my computations.


Figure 7: Unadjusted Data, Airports Excluded (green) vs. NASA/GISS (red)

There is a very visible reduction in the bias between my unadjusted results and the NASA results (especially prior to 1950 or so) when airport stations are excluded from my unadjusted data processing. This is quite consistent with the notion that many of the stations currently located at airports were moved to their current locations from city centers at some point during their history.

Now just for fun, let’s look at what happens when we do the reverse and exclude non-airport stations (i.e. process only the airport stations). Figure 8 shows what we get when we process unadjusted data exclusively from “airport” stations.


Figure 8: Unadjusted Data, Airports Only (green) vs. NASA/GISS (red)

Well, look at that! The pre-1950 bias between my unadjusted data results and the NASA/GISS results really jumps out. And take note of another interesting thing about the plot -- in spite of the fact that I processed only “airport” stations, the green “airports only” temperature curve goes all the way back to 1880, decades prior to the existence of airplanes (or airports)! It is only reasonable to conclude that those “airport” stations must have been moved at some point in their history.

Now, for a bit more fun, let’s drill down a little further into the data and process only airport stations that also have temperature data records going back to 1903 (the year that the Wright Brothers first successfully flew an airplane) or earlier.

When I drilled down into the data, I found over 400 “airport” temperature stations with data going back to 1903 or earlier. And when I computed global-average temperature estimates from just those stations, this is what I got (Figure 9):


Figure 9: Unadjusted Data, Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

OK, that looks pretty much like the previous temperature plot, except that my results are “noisier” due to the fact that I processed data from fewer temperature stations.

And for even more fun, let’s look at the results we get when we process data exclusively from non-airport stations with data going back to 1903 or earlier:


Figure 10: Unadjusted Data, Non-Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

When only non-airport stations are processed, the pre-1950 “eyeball estimate” bias between my unadjusted data temperature curve and the NASA/GISS temperature curve is sharply reduced.

The results seen in the above plots are entirely consistent with the notion that the movement of large numbers of temperature stations from city centers to cooler outlying airport locations during the middle of the 20th Century is responsible for much of the bias seen between the unadjusted and adjusted GHCNv3 global-average temperature results.

It is quite reasonable to conclude, based on the results presented here, that one major reason for the bias seen between the GHCNv3 unadjusted and adjusted data results is the presence of corrections for those station moves in the adjusted data (corrections that are obviously absent from the unadjusted data). Those corrections remove the contaminating effects of station moves and permit more accurate estimates of global surface temperature increases over time.

Take-home lessons (in no particular order):

  1. Even a very simple global temperature algorithm can reproduce the NASA/GISS results very closely. This really is a case where you can get 98% of the answer (per my R-squared statistic) with less than 1% of the effort.
  2. NOAA’s GHCNv3 monthly data repository contains everything an independent “citizen scientist” needs (data and documentation) to conduct his/her own investigation of the global land station temperature data.
  3. A direct comparison of unadjusted data results (all GHCN stations) vs. the NASA/GISS adjusted data temperature curves reveals only modest differences between the two temperature curves, especially for the past 6 decades. Furthermore, my unadjusted and the NASA/GISS adjusted results show nearly identical (and record) temperatures for 2016. If NASA and NOAA were adjusting data to exaggerate the amount of planetary warming, they sure went to an awful lot of trouble and effort to produce only a small overall increase in warming in the land station data.
  4. Eliminating all “airport” stations from the processing significantly reduced the bias between my unadjusted data results and the NASA/GISS results. It is therefore reasonable to conclude that a large share of the modest bias between my GHCN v3 unadjusted results and the NASA/GISS adjusted data results is the result of corrections for station moves from urban centers to outlying airports (corrections present in the adjusted data, but not in the unadjusted data).
  5. Simply excluding “airport” stations likely eliminates many stations that were always located at airports (and never moved) and also fails to eliminate stations that were moved out from city centers to non-airport locations. So it is not a comprehensive evaluation of the impacts of station moves. However, it is a very easy “first step” analysis exercise to perform; even this incomplete “first step” analysis produces results that strongly consistent with the hypothesis that corrections for station moves are likely the dominant reason for the pre-1950 bias seen between the adjusted and unadjusted GHCN global temperature results. Remember that many urban stations were also moved from city centers to non-airport locations during the mid-20th century. Unfortunately, those station moves are not recorded in the simple summary metadata files supplied with the GHCNv3 monthly data. An analysis of NOAA’s more detailed metadata would be required to identify those stations and perform a more complete analysis of the impacts of station moves. However, that is outside of the scope of this simple project.
  6. For someone who has the requisite math and programming skills, confirming the results presented here should not be very hard at all. Skeptics should try it some time. Provided that those skeptics are willing and able to accept results that contradict their original views about temperature data adjustments, they could have a lot of fun taking on a project like this.

Related reading

Also the Clear Climate Code project was able to reproduce the results of NASA-GISS. Berkeley Earth made an high-level independent analysis and confirmed previous results. Also (non-climate) scientist Nick Stokes (Moyhu) computed his own temperature signal: TempLS which also fits well.

In 2010 Zeke Hausfather analyzed the differences in GHCNv2 between airport and other stations and found only minimal differences: Airports and the land temperature record.

At about the same time David Jones at Clear Climate Code also looked at airport station, just splitting the dataset in two groups, and did found differences: Airport Warming. Thus making sure both groups are regionally comparable is probably important.

The global warming conspiracy would be huge. Not only the 7 global datasets also national datasets from so many groups show clear warming.

Just the facts, homogenization adjustments reduce global warming.

Why raw temperatures show too little global warming.

Irrigation and paint as reasons for a cooling bias.

Temperature trend biases due to urbanization and siting quality changes.

Temperature bias from the village heat island

Cooling moves of urban stations. From cities to airports or simply to outside a city or village.

The transition to automatic weather stations. We’d better study it now. It may be a cooling bias.

Changes in screen design leading to temperature trend biases.

Early global warming

Cranberry picking short-term temperature trends

How climatology treats sceptics