Variable Variability: With some programing skills you can compute global mean temperatures yourself

This is a guest post by citizen scientist Ron Roeland (not his real name, but I like alliteration for some reason). Being an actually sceptical person, he decided to compute the global mean land temperature from station observations himself. He could reproduce the results of the main scientific groups that compute this signal and, new for me, while studying the data noticed how important the relocation of temperature stations to airports is for the NOAA GHCNv3 dataset. (The headers in the post are mine.)

This post does not pretend to present a rigorous analysis of the global temperature record; instead, it intends to show how easy it is for someone with basic programming/math skills to debunk claims that NASA and NOAA have manipulated temperature data to produce their global-average temperature results, i.e. claims like these:

From C3 Headlines: By utilizing questionable adjustments based on even more questionable assumptions, NOAA managed to produce an entirely fabricated increase in the global warming trend from 1998 to 2012.

From a blogger on the Hill: There’s going to have to be a massive effort to pick apart failing climate models and questionably-adjusted data.

From Climate Depot: Over the past decade, NASA and NOAA have continuously altered the temperature record to cool the past and warm the present. Their claims are straight out Orwell's 1984, and have nothing to do with science'

The routine

Some time ago, after reading all kinds of claims (like the ones above) about how NASA and NOAA had improperly adjusted temperature data to produce their global-average temperature results, I decided to take a crack at the data myself.

I coded up a straightforward baselining/gridding/averaging routine that is quite simple and “dumbed down” in comparison to the NASA and NOAA algorithms. Below is a complete description of the algorithm I coded up.

Using GHCN v3 monthly-average data, compute 1951-1980 monthly baseline temperatures for all GHCN stations. If a station has 15 or more valid temperatures in any given month for the 1951-1980 baseline period, retain that monthly baseline value; otherwise drop that station/month from the computations. Stations with no valid monthly baseline periods are completely excluded from the computations.
For all stations and months where valid baseline temperature estimates were computed per (1) above, subtract the respective baseline temperatures from all of the station monthly temperature temperatures to produce monthly temperature anomalies for the years 1880-2015.
Set up a global gridding scheme to perform area-weighting. To keep things really simple, and to minimize the number of empty grid-cells, I selected large grid-cell sizes (20 degrees x 20 degrees at the Equator). I also opted to recalculate the grid-cell latitude dimensions as one goes north/south of the equator in order to keep the grid-cell areas as nearly constant as possible. I did this to keep the grid-cell areas from shrinking (per the latitude cosines) in order to minimize the number of empty grid cells.
In each grid-cell, compute the average (over all stations in the grid-cell) of the monthly temperature anomalies to produce a single time-series of average temperature anomalies for each month (years 1880 through 2015).
Compute global average monthly temperature anomalies by averaging together all the grid-cell monthly average anomalies, weighted by the grid-cell areas (again, for years 1880 through 2015).
Compute global-average annual anomalies for years 1880 through 2015 by averaging together the global monthly anomalies for each year.

The algorithm does not involve any station data adjustments (obviously!) or temperature interpolation operations. It’s a pretty basic number-crunching procedure that uses straightforward math plus a wee bit of trigonometry (for computing latitude/longitude grid-cell areas).

For me, the most complicated part of the algorithm implementation was managing the variable data record lengths and data gaps (monthly and annual) in the station data -- basically, the “data housekeeping” stuff. Fortunately, modern development libraries such as the C++ Standard Template Library make this less of a chore than it used to be.

Why this routine?

People unfamiliar with global temperature computational methods sometimes ask: “Why not simply average the temperature station data to compute global-average estimates? Why bother with the baselining and gridding described above?”

We could get away with straight averaging of the temperature data if it were not for the two problems described below.

Problem 1: Temperature stations have varying record lengths. The majority of stations do not have continuous data records that go all the way back to 1880 (the beginning of the NASA/GISS global temperature calculations). Even stations with data going back to 1880 have gaps in their records -- there are missing months or even years.

Problem 2: Temperature stations are not evenly distributed over the Earth’s surface. Some regions, like the continental USA and western Europe, have very dense networks of stations. Other regions, like the African continent, have very sparse station networks.

As a result of problem 1, we have a mix of temperature stations that changes from year to year. If we were simply to average the absolute temperature data from all those stations, the final global-average results would be significantly skewed from year to year due to the changing mix of stations from one year to the next.

Fortunately, the solution for this complication is quite straightforward: the baselining and anomaly-averaging procedure described above. For those who already familiar with this procedure, please bear with me while I illustrate how it works with a simple scenario constructed from simulated data.

Let’s consider a very simple scenario where the full 1880-2016 temperature history for a particular region is contained in data reported by two temperature stations, one of which is located on a hilltop and the other located on a nearby valley floor. The hilltop and valley floor locations have identical long-term temperature trends, but the hilltop location is consistently about 1 degree C cooler than the valley floor location. The hilltop temperature station has a temperature record starting in 1880 and ending in 1990. The valley floor station has a temperature record beginning in 1930 and ending in 2016.

Figure 1 below shows the simulated temperature time-series for these two hypothetical stations. Both time-series were constructed by superimposing random noise on the same linear trend, with the valley-floor station time-series having a constant offset temperature 1 degree C more than that of the hilltop station time-series. The simulated time-series for the hilltop station (red) begins in 1880 and continues to 1990. The simulated valley floor station temperature (blue) data begins in 1930 and runs to 2016. As can be seen during their period of overlap (1930-1990), the simulated valley-floor temperature data runs about 1 degree warmer than the simulated hilltop temperature data.

Figure 1: Simulated Hilltop Station Data (red) and Valley Floor Station Data (blue)

If we were to attempt to construct a complete 1880-2016 temperature history for this region by computing a straight average of the hilltop and valley floor data, we would obtain the results seen in Figure 2 below.

Figure 2: Straight Average of Valley Floor Station Data and Hilltop Station Data

The effects of the changing mix of stations (hilltop vs. valley floor) on the average temperature results can clearly be seen in Figure 2. A large temperature jump is seen at 1930, where the warmer valley floor data begins, and a second temperature jump is seen at 1990 where the cooler hilltop data ends. These temperature jumps obviously do not represent actual temperature increases for that particular region; instead, they are artifacts introduced by the changes in the mix of stations in 1930 and 1990.

An accurate reconstruction of the regional temperature history computed from these two temperature time-series obviously should show the warming trend seen in the hilltop and valley floor data over the entire 1880-2016 time period. That is clearly not the case here. Much of the apparent warming seen in Figure 2 is a consequence of the changing mix of stations.

Now, let’s modify the processing a bit by subtracting the (standard NASA/GISS) 1951-1980 hilltop baseline average temperature from the hilltop temperature data and the 1951-1980 valley floor baseline average temperature from the valley floor temperature data. This procedure produces the temperature anomalies for the hilltop and valley floor stations. Then for each year, compute the average of the station anomalies for the 1880-2016 time period.

This is the baselining and anomaly-averaging procedure that is used by NASA/GISS, NOAA, and other organizations to produce their global-average temperature results.

When this baselining and anomaly-averaging procedure is applied to the simulated temperature station data, it produces the results that can be viewed in figure 3 below.

Figure 3: Average of Valley Floor Station Anomalies and Hilltop Station Anomalies

In Figure 3, the temperature jumps associated with the beginning of the valley floor data record and the end of the hilltop data record have been removed, clearly revealing the underlying temperature trend shared by the two temperature time-series.

Also note that although neither of my simulated temperature stations have a full 1880-2016 temperature record, we were still able to compute a complete reconstruction for the 1880-2016 time period because there was enough overlap between the station records to allow us to “align” them via baselining.

The second problem, the non-uniform distribution of temperature stations, can clearly be seen in Figure 4 below. That figure shows all GHCNv3 temperature stations that have data records beginning in 1900 or earlier and continuing to the present time.

Figure 4: Long-Record GHCN Station Distribution

As one can see, the stations are highly concentrated in the continental USA and western Europe; Africa and South America, in contrast, have very sparse coverage. A straight unweighted average of the data from all the stations shown in the above image would result in temperature changes in the continental USA and western Europe “swamping out” temperature changes in South America and Africa in the final global average calculations.

That is the problem that gridding solves. The averaging procedure using grid-cells is performed in two steps. First, the temperature time-series for all stations in each grid-cell are averaged together to produce a single time-series per grid-cell. Then all the grid-cell time-series are averaged together to construct the final global-average temperature results (note: in the final average, the grid-cell time-series are weighted according to the size of each grid-cell). This eliminates the problem where areas on the Earth with very dense networks of stations are over-weighted in the global average relative to areas where the station coverage is more sparse.

Now, some have argued that the sparse coverage of certain regions of the Earth invalidate the global-average temperature computations. But it turns out that the NASA/GISS warming trend can be confirmed even with a very sparse sampling of the Earth’s surface temperatures. (In fact, the NASA/GISS warming trend can be replicated very closely with data from as few as 30 temperature stations scattered around the world.)

Real-world results

Now that we are done with the preliminaries, let’s look at some real-world results. Let’s start off by taking a look at how my simple “dumbed-down” gridding/averaging algorithm compares with the NASA/GISS algorithm when it is used to process the same GHCNv3 adjusted data that NASA/GISS uses. To see how my algorithm compares with the NASA/GISS algorithm, take a look at Figure 5 below, where the output of my algorithm is plotted directly against the NASA/GISS “Global Mean Estimates based on Land Data only” results.

(Note: All references to NASA/GISS global temperature results in this post refer specifically to the NASA/GISS “Global Mean Estimates based on Land Data only” results. Those results can be viewed on the NASA/GISS web-site; scroll down to view the “Global Mean Estimates based on Land Data only” graph).

Figure 5: Adjusted Data, All Stations: My Simple Gridding/Averaging (blue) vs. NASA/GISS (red)

In spite of the rudimentary nature of my algorithm, my algorithm produces results that match the NASA/GISS results quite closely. According to the R-squared statistic I calculated (seen in the upper-left corner of Figure 5), I got 98% of the NASA/GISS answer with a only tiny fraction of the effort!

But what happens when we use unadjusted GHCNv3 data? Well, let’s go ahead and compare the output of my algorithm with the NASA/GISS algorithm when my algorithm is used to process the unadjusted GHCNv3 data. Figure 6 below shows a plot of my unadjusted global temperature results vs. the NASA/GISS results (remember that NASA/GISS uses adjusted GHCNv3 data).

Figure 6: Unadjusted Data, All Stations: My Simple Gridding /Averaging (green) vs. NASA/GISS (red)

My “all stations” unadjusted data results show a warming trend that lines up very closely with the NASA/GISS warming trend from 1960 to 2016, with my results as well as the NASA/GISS results showing record high temperatures for 2016. However, my results do show a visible warm-bias relative to the NASA/GISS results prior to 1950 or so. This is the basis of the accusations that NOAA and NASA “cooled the past (and warmed the present)” to exaggerate the global warming trend.

Now, why do my unadjusted data results show that pre-1950 “warm bias” relative to the NASA/GISS results? Well, this excerpt from NOAA’s GHCN FAQ provides some clues:

Why are there more cold (negative) step changes than warm(positive) step changes in the historical land surface air temperature records represented in the GHCN v3 dataset?

The reason for the larger number of cold step changes is not completely clear, but they may be due in part to systematic changes in station locations from city centers to cooler airport locations that occurred in many parts of the world from the 1930s to through the 1960s.

Because the GHCNv3 metadata contains an airport designator field for every temperature station, it was quite easy for me to modify my program to exclude all the “airport” stations from the computations. So let’s exclude all of the “airport” station data and see what we get. Figure 7 below shows my unadjusted data results vs. the NASA/GISS results when all “airport” stations are excluded from my computations.

Figure 7: Unadjusted Data, Airports Excluded (green) vs. NASA/GISS (red)

There is a very visible reduction in the bias between my unadjusted results and the NASA results (especially prior to 1950 or so) when airport stations are excluded from my unadjusted data processing. This is quite consistent with the notion that many of the stations currently located at airports were moved to their current locations from city centers at some point during their history.

Now just for fun, let’s look at what happens when we do the reverse and exclude non-airport stations (i.e. process only the airport stations). Figure 8 shows what we get when we process unadjusted data exclusively from “airport” stations.

Figure 8: Unadjusted Data, Airports Only (green) vs. NASA/GISS (red)

Well, look at that! The pre-1950 bias between my unadjusted data results and the NASA/GISS results really jumps out. And take note of another interesting thing about the plot -- in spite of the fact that I processed only “airport” stations, the green “airports only” temperature curve goes all the way back to 1880, decades prior to the existence of airplanes (or airports)! It is only reasonable to conclude that those “airport” stations must have been moved at some point in their history.

Now, for a bit more fun, let’s drill down a little further into the data and process only airport stations that also have temperature data records going back to 1903 (the year that the Wright Brothers first successfully flew an airplane) or earlier.

When I drilled down into the data, I found over 400 “airport” temperature stations with data going back to 1903 or earlier. And when I computed global-average temperature estimates from just those stations, this is what I got (Figure 9):

Figure 9: Unadjusted Data, Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

OK, that looks pretty much like the previous temperature plot, except that my results are “noisier” due to the fact that I processed data from fewer temperature stations.

And for even more fun, let’s look at the results we get when we process data exclusively from non-airport stations with data going back to 1903 or earlier:

Figure 10: Unadjusted Data, Non-Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

When only non-airport stations are processed, the pre-1950 “eyeball estimate” bias between my unadjusted data temperature curve and the NASA/GISS temperature curve is sharply reduced.

The results seen in the above plots are entirely consistent with the notion that the movement of large numbers of temperature stations from city centers to cooler outlying airport locations during the middle of the 20th Century is responsible for much of the bias seen between the unadjusted and adjusted GHCNv3 global-average temperature results.

It is quite reasonable to conclude, based on the results presented here, that one major reason for the bias seen between the GHCNv3 unadjusted and adjusted data results is the presence of corrections for those station moves in the adjusted data (corrections that are obviously absent from the unadjusted data). Those corrections remove the contaminating effects of station moves and permit more accurate estimates of global surface temperature increases over time.

Take-home lessons (in no particular order):

Even a very simple global temperature algorithm can reproduce the NASA/GISS results very closely. This really is a case where you can get 98% of the answer (per my R-squared statistic) with less than 1% of the effort.
NOAA’s GHCNv3 monthly data repository contains everything an independent “citizen scientist” needs (data and documentation) to conduct his/her own investigation of the global land station temperature data.
A direct comparison of unadjusted data results (all GHCN stations) vs. the NASA/GISS adjusted data temperature curves reveals only modest differences between the two temperature curves, especially for the past 6 decades. Furthermore, my unadjusted and the NASA/GISS adjusted results show nearly identical (and record) temperatures for 2016. If NASA and NOAA were adjusting data to exaggerate the amount of planetary warming, they sure went to an awful lot of trouble and effort to produce only a small overall increase in warming in the land station data.
Eliminating all “airport” stations from the processing significantly reduced the bias between my unadjusted data results and the NASA/GISS results. It is therefore reasonable to conclude that a large share of the modest bias between my GHCN v3 unadjusted results and the NASA/GISS adjusted data results is the result of corrections for station moves from urban centers to outlying airports (corrections present in the adjusted data, but not in the unadjusted data).
Simply excluding “airport” stations likely eliminates many stations that were always located at airports (and never moved) and also fails to eliminate stations that were moved out from city centers to non-airport locations. So it is not a comprehensive evaluation of the impacts of station moves. However, it is a very easy “first step” analysis exercise to perform; even this incomplete “first step” analysis produces results that strongly consistent with the hypothesis that corrections for station moves are likely the dominant reason for the pre-1950 bias seen between the adjusted and unadjusted GHCN global temperature results. Remember that many urban stations were also moved from city centers to non-airport locations during the mid-20th century. Unfortunately, those station moves are not recorded in the simple summary metadata files supplied with the GHCNv3 monthly data. An analysis of NOAA’s more detailed metadata would be required to identify those stations and perform a more complete analysis of the impacts of station moves. However, that is outside of the scope of this simple project.
For someone who has the requisite math and programming skills, confirming the results presented here should not be very hard at all. Skeptics should try it some time. Provided that those skeptics are willing and able to accept results that contradict their original views about temperature data adjustments, they could have a lot of fun taking on a project like this.

Related reading

Also the Clear Climate Code project was able to reproduce the results of NASA-GISS. Berkeley Earth made an high-level independent analysis and confirmed previous results. Also (non-climate) scientist Nick Stokes (Moyhu) computed his own temperature signal: TempLS which also fits well.

In 2010 Zeke Hausfather analyzed the differences in GHCNv2 between airport and other stations and found only minimal differences: Airports and the land temperature record.

At about the same time David Jones at Clear Climate Code also looked at airport station, just splitting the dataset in two groups, and did found differences: Airport Warming. Thus making sure both groups are regionally comparable is probably important.

The global warming conspiracy would be huge. Not only the 7 global datasets also national datasets from so many groups show clear warming.

Just the facts, homogenization adjustments reduce global warming.

Why raw temperatures show too little global warming.

Irrigation and paint as reasons for a cooling bias.

Temperature trend biases due to urbanization and siting quality changes.

Temperature bias from the village heat island

Cooling moves of urban stations. From cities to airports or simply to outside a city or village.

The transition to automatic weather stations. We’d better study it now. It may be a cooling bias.

Changes in screen design leading to temperature trend biases.

Early global warming

Cranberry picking short-term temperature trends

How climatology treats sceptics

10 comments:

MagmaMonday, 30 January 2017 at 18:34:00 GMT
I find temperature homogenization papers and blog posts very interesting. The methods used are conceptually similar but different in execution to those used to level geophysical data such as ground and aeromagnetic surveys, gravity measurements, and (I believe) some high-accuracy GPS time series.

Or course in these fields we rarely have a group of highly motivated nitwits shouting at us that we're "doing it wrong" or questioning our motives.
Victor VenemaMonday, 30 January 2017 at 18:47:00 GMT
Homogenization is normal innocent data processing to remove errors and see what you are interested in with more accuracy. Scientists will always strive to improve the accuracy of their data that helps to see more subtle phenomena and understand the object/climate better.

Homogenization in other fields of study is something on my long list of stuff I would love to blog about. But because it is not my field, such posts take more time to write. Feel free to write a guest post on homogenization in geophysical data.
SteveTuesday, 31 January 2017 at 05:14:00 GMT
This is very interesting - to this non expert reader - particularly the section on the effect of moving weather stations from the centre of cities to airports on the edges of cities.
Is there any peer reviewed research on this topic?
I have come across people who claim that urbanization produces a warming bias in the temperature record.
If the growth of towns has produced a warming bias, some or all of this could have been offset by the movement of weather stations.
As well the description of the value of anomalies is one of the clearest I have read.
Thanks for the post.
Nick StokesTuesday, 31 January 2017 at 09:44:00 GMT
There was a flurry of activity in early 2010 where people calculated and compared land, and sometimes land/ocean averages by methods similar to these. At least one group were sceptics. They got similar results to everyone else.

Mine developed into TempLS, which I calculate and post on early each month. It does use unadjusted GHCN and ERSST, but of course can also use adjusted, and I have used that to show the effect of adjustment (not much, especially in whole globe land/ocean).
Victor VenemaTuesday, 31 January 2017 at 11:57:00 GMT
@Steve. There are many studies that show that urban areas are warmer (not all, sometimes they are cooler) and that urbanization leads to warming. On the other hand there are also many studies that in the global temperature record urban stations do not warm more than nearby more rural stations. The difference is likely because urban stations are often relocated to better places. When I asked my Chinese colleagues how many of their stations were affected by urbanization, they gave me the number for how many stations were relocated. In the end what matters is how urban was the original location and how urban is the current one. It would even be possible that urban stations have a cooling bias. Unfortunately, I do not know of articles studying both urbanization and relocations; I wrote this post explaining the problem hoping to stimulate such research.
Victor VenemaTuesday, 31 January 2017 at 12:20:00 GMT
@Nick, one would wish that that "flurry of activity in early 2010" had finished that theme. But some mitigation skeptics do not care about reality in any way and keep on cherry picking small regions (USA, 1.5% of the world, or smaller), cherry picking stations within those regions, abusing the processing. Rightfully called "zombie memees". We now see in the downfall of the USA where that leads to. It is sick.
tonybFriday, 3 February 2017 at 21:45:00 GMT
Whilst you will be aware of it, your readers may be Unfamiliar with the epic work compiled by camuffo and Jones which looked at seven early temperature recode in Europe and traced what happened to them over the years

http://www.isac.cnr.it/~microcl/climatologia/improve.php

Its a very big book which looks at station changes, instrumentation the effects of urbanisation and a variety of other factors

Tonyb
JamesSTuesday, 11 July 2017 at 01:52:00 BST
Victor, when calculating your baseline, you said that if a month had less than 15 valid records, you did not calculate a baseline for that month. Carrying this forward, how many valid months in a calendar year did you require to calculate a baseline for that year, and how many valid years did you require to use that station's baseline at all?
Victor VenemaTuesday, 11 July 2017 at 11:18:00 BST
James, this is a guest post, so I did not compute anything.

Using such baselines, the anomalies no longer have a seasonal cycle. Thus having less months per year only produces a larger uncertainty in the annual average.

Up to now I was fortunate to work on projects where I had enough data and could compute annual only in case there were no missing months. But if you have less data or data with more missing values, you can relax this with no problem.

If your minimum number of month is really low you may have to take differences in uncertainty into account, for example use weighted regression to compute trends.
JamesSThursday, 13 July 2017 at 07:08:00 BST
Victor,

Thanks for answering. I did miss that "Ron" was the actual author (or forgot by the time I got around to posting). I've been doing some data analysis of the GSN stations' *.dly files, and I was somewhat surprised by the sparseness at some of the stations.

I've just started, so I don't have any results to report yet, but I will post when I have some results.

Comments are welcome, but comments without arguments may be deleted. Please try to remain on topic. (See also moderation page.)

I read every comment before publishing it. Spam comments are useless.

This comment box can be stretched for more space.

Pages

Monday, 30 January 2017

With some programing skills you can compute global mean temperatures yourself