Friday, 18 December 2015

Anthony Watts at AGU2015: Comparison of Temperature Trends Using an Unperturbed Subset of The U.S. Historical Climatology Network

[UPDATE. I will never understand how HotWhopper writes such understandable articles so fast, but it might be best to read the HotWhopper introduction first.]

Remember the Watts et al. manuscript in 2012? Anthony Watts putting his blog on hold to urgently finish his draft? This study is now a poster at the AGU conference and Watts promises to submit it soon to an undisclosed journal.

On first sight, the study now has a higher technical quality and some problems have been solved. The two key weakness are, however, not discussed in the press release to the poster. This is strange. I have had long discussions with second author Evan Jones about this. Scientists (real sceptics) have to be critical about their own work. You would expect a scientist to focus a large part of a study on any weaknesses, if possible try to show they probably do not matter or else at least honestly confront the weaknesses, rather than simply ignore them.

Watts et al. is about the immediate surrounding, also called micro-siting, of weather stations that measure the surface air temperature. The American weather stations have been assessed for their quality in five categories by volunteers of the blog WUWT. Watts and colleagues call the two best categories "compliant" and the three worst ones "non-compliant". For these two categories they then compare the average temperature signal for the 30-year period 1979 – 2008.

An important problem of the 2012 version of this study was that historical records typically also contain temperature changes because the method of observation has changed. An important change in the USA is the time of observation bias. In the past observations were more often made in the afternoon than in the morning. Morning measurements results in somewhat lower temperatures. This change in the time of observation creates a bias of about 0.2°C per century and was ignored in the 2012 study. Also the auditor, Steve McIntyre, who was then a co-author admitted this was an error. This problem is now fixed; stations with a change in the time of observation have been removed from the study.


A Stevenson screen.
Another important observational change in the USA is the change of the screen used to protect the thermometer from the sun. In the past, so-called Cotton Region Shelters (CRS) or Stevenson screens were used, nowadays more and more automatic weather stations (AWS) are used.

A much used type of AWS in the USA is the MMTS. America was one of the first countries to automatize its network, with then analogue equipment that did not allow for long cables between the sensor and the display, which is installed inside a building. Furthermore, the technicians only had one day per station and as a consequence many of the MMTS systems were badly sited. Although they are badly sited, these MMTS system typically measure 0.2°C 0.15°C cooler temperatures. The size of the cooling has been estimated by comparing a station with such a change with a neighbouring station where nothing happens. Because both stations experience about the same weather, the difference signal shows the jump in the mean temperature more clearly.

Two weaknesses

Weakness 1 is that the authors only know the siting quality at the end of the period. Stations in the compliant categories may have been in less well sited earlier on, while stations in the non-compliant categories may have been better sited before.

pete:
Someone has a weather station in a parking lot. Noticing their error, they move the station to a field, creating a great big cooling-bias inhomogeneity. Watts comes along, and seeing the station correctly set up says: this station is sited correctly, and therefore the raw data will provide a reliable trend estimate.
The study tries to reduce this problem by creating a subset of stations that is unperturbed by Time of Observation changes, station moves, or rating changes. At least according to the station history (metadata). The problem is that metadata is never perfect.

The scientists working on homogenization thus advise to always also detect changes in the observational methods (inhomogeneities) by comparing a station to its neighbours. I have told Evan Jones how important this is, but they refuse to use homogenization methods because they feel homogenization does not work. In a scientific paper, they will have to provide evidence to explain why they reject an established method that could ameliorate a serious problem with their study. The irony is that the MMTS adjustments, which the Watts et al. study does use, depend on the same principle.

Weakness 2 is that the result is purely statistical and that no physical explanation is provided for the result. It is clear that bad micro-siting will lead to a temperature bias, but this does not affect the trend, while the study shows a difference in trend. I would not know how bad or good constant siting quality would change a trend. The press release also does not offer an explanation.

What makes this trend difference even more mysterious, if it were real, is that it mainly happens in the 1980s and 1990s, but has stopped in the last decade. See the graph below showing the trend for compliant (blue) and non-compliant stations (orange).



[UPDATE. The beginning period in which the difference builds up and that since 1996 the trends for "compliant" and "non-compliant" stations is the same is better seen in the graph below computed from the data in the above figure digitized by George Bailley. (No idea what the unit of the y-axis is on either of these graphs. Maybe 0.001°C.)


]

That the Watts phenomenon has stopped is also suggested by a comparison of the standard USA climate network (USHCN) and a new climate-quality network with perfect siting (USCRN) shown below. The pristine network even warms a little more. (Too little to be interpreted.)



While I am unable to see a natural explanation for the trend difference, that the difference is mainly seen in the first two decades fits to the hypothesis that the siting quality of the compliant stations was worse in the past: that in the past these stations were less compliant and a little too warm. The further you go back in time, the more likely it becomes that some change has happened. And the further you go back in time, the more likely it is that this change is no longer known.

six key findings

Below I have quoted the six key findings of Watts et al. (2015) according to the press release.

1. Comprehensive and detailed evaluation of station metadata, on-site station photography, satellite and aerial imaging, street level Google Earth imagery, and curator interviews have yielded a well-distributed 410 station subset of the 1218 station USHCN network that is unperturbed by Time of Observation changes, station moves, or rating changes, and a complete or mostly complete 30-year dataset. It must be emphasized that the perturbed stations dropped from the USHCN set show significantly lower trends than those retained in the sample, both for well and poorly sited station sets.

The temperature network in the USA has on average one detectable break every 15 years (and a few more breaks that are too small to be detected, but can still influence the result). The 30-year period studied should thus contain on average 2 breaks and likely only 12.6% of the stations do not have a break (154 stations). According to Watts et al. 410 of 1218 stations have no break. 256 stations (more than half their "unperturbed" dataset) thus likely have a break that Watts et al. did not find.

That the "perturbed" stations have a smaller trend than the "unperturbed" stations confirms what we know: that in the USA the inhomogeneities have a cooling bias. In the "raw" data the "unperturbed" subset has a trend in the mean temperature of 0.204°C per decade; see table below. In the "perturbed" subset the trend is only 0.126°C per decade. That is a whooping cooling difference of 0.2°C over this period.


Table 1 of Watts et al. (2015)

2. Bias at the microsite level (the immediate environment of the sensor) in the unperturbed subset of USHCN stations has a significant effect on the mean temperature (Tmean) trend. Well sited stations show significantly less warming from 1979 – 2008. These differences are significant in Tmean, and most pronounced in the minimum temperature data (Tmin). (Figure 3 and Table 1 [shown above])

The stronger trend difference for the minimum temperature would also need an explanation.

3. Equipment bias (CRS [Cotton Region Shelter] v. MMTS [Automatic Weather station] stations) in the unperturbed subset of USHCN stations has a significant effect on the mean temperature (Tmean) trend when CRS stations are compared with MMTS stations. MMTS stations show significantly less warming than CRS stations from 1979 – 2008. (Table 1 [shown above]) These differences are significant in Tmean (even after upward adjustment for MMTS conversion) and most pronounced in the maximum temperature data (Tmax).

The trend for the stations that use a Cotton Region Shelter is 0.3°C per decade. That is large and should be studied. This was the typical shelter in the past. Thus we can be quite sure that in these cases the shelter did not change, but there could naturally have been other changes.

4. The 30-year Tmean temperature trend of unperturbed, well sited stations is significantly lower than the Tmean temperature trend of NOAA/NCDC official adjusted homogenized surface temperature record for all 1218 USHCN stations.

It is natural that the trend in the raw data is smaller than the trend in the adjusted data. Mainly for the above mentioned reasons (TOBS and MMTS) the biases in the USA are large compared to the rest of the world and the trend in the USA is adjusted 0.4°C per century upwards.

5. We believe the NOAA/NCDC homogenization adjustment causes well sited stations to be adjusted upwards to match the trends of poorly sited stations.

Well, they already wrote "we believe". There is no evidence for this claim.

6. The data suggests that the divergence between well and poorly sited stations is gradual, not a result of spurious step change due to poor metadata.

The year to year variations in a single station series is about 1°C. I am not sure whether one would see whether the inhomogeneity is one or more step changes or a gradual change.

Review

If I were reviewer of this manuscript, I would ask about some choices that seem arbitrary and I would like to know whether they matter. For example using the period 1979 – 2008 and not continuing the data to 2015. It is fine to also show data until 2008 for better comparisons with earlier papers, but stopping 7 years earlier is suspicious. Also the choice to drop stations with TOBS changes, but to correct stations with MMTS changes sounds strange. It would be of interest whether any of the other 3 options show different results. Anomalies should be computed over a period, not relative to the starting year.

I hope that Anthony Watts and Evan M. Jones find the above comments useful. Jones wrote earlier this year:
Oh, a shout-out to Dr. Venema, one of the earlier critics of Watts et al. (2012) who pointed out to us things that needed to be accounted for, such as TOBS, a stricter hand on station moves, and MMTS equipment conversion.

Note to Anthony: In terms of reasonable discussion, VV is way up there. He actually has helped to point us in a better direction. I think both Victor Venema and William Connolley should get a hat-tip in the paper (if they would accept it!) because their well considered criticism was of such great help to us over the months since the 2012 release. It was just the way science is supposed to be, like you read about in books.
Watts wrote in the side notes to his press release:
Even input from openly hostile professional people, such as Victor Venema, have been highly useful, and I thank him for it.
Glad to have been of help. I do not recall having been "openly hostile" to this study. It would be hard to come to a positive judgement of the quality of the blog posts at WUWT, whether they are from the pathological misquoter Monckton or greenhouse effect denier Tim Ball.

However, it is always great when people contribute to the scientific literature. When the quality of their work meets the scientific standard, it does not matter what their motivation is, then science can learn something. The surface stations project is useful to learn more about the quality of the measurements; also for trend studies if continued over the coming decades.

Comparison of Temperature Trends Using an Unperturbed Subset of The U.S. Historical Climatology Network

Anthony Watts, Evan Jones, John Nielsen-Gammon and John Christy
Abstract. Climate observations are affected by variations in land use and land cover at all scales, including the microscale. A 410-station subset of U.S. Historical Climatology Network (version 2.5) stations is identified that experienced no changes in time of observation or station moves during the 1979-2008 period. These stations are classified based on proximity to artificial surfaces, buildings, and other such objects with unnatural thermal mass using guidelines established by Leroy (2010). The relatively few stations in the classes with minimal artificial impact are found to have raw temperature trends that are collectively about 2/3 as large as stations in the classes with greater expected artificial impact. The trend differences are largest for minimum temperatures and are statistically significant even at the regional scale and across different types of instrumentation and degrees of urbanization. The homogeneity adjustments applied by the National Centers for Environmental Information (formerly the National Climatic Data Center) greatly reduce those differences but produce trends that are more consistent with the stations with greater expected artificial impact. Trend differences between the Cooperative Observer Network and the Climate Reference Network are not found during the 2005-2014 sub-period of relatively stable temperatures, suggesting that the observed differences are caused by a physical mechanism that is directly or indirectly caused by changing temperatures.

[UPDATE. I forgot to mention the obvious: After homogenization the trend Watts et al. (2015) computed are nearly the same for all five siting categories, just like it was for Watts et al. (2012) and the published study Fall et al. Thus for the data used by climatologists, the homogenized data, the siting quality does not matter. Just like before, they did not study homogenization algorithms and thus cannot draw any conclusions about them, but unfortunately they do.]



Related reading

Anthony Watts' #AGU15 poster on US temperature trends

Blog review of the Watts et al. (2012) manuscript on surface temperature trends

A short introduction to the time of observation bias and its correction

Comparing the United States COOP stations with the US Climate Reference Network

WUWT not interested in my slanted opinion

Some history from 2010

On Weather Stations and Climate Trends

The conservative family values of Christian man Anthony Watts

Watts not to love: New study finds the poor weather stations tend to have a slight COOL bias, not a warm one

Poorly sited U.S. temperature instruments not responsible for artificial warming

254 comments:

1 – 200 of 254   Newer›   Newest»
Peter Thorne said...

Victor, just to note that the MMTS is not a truly automated instrument. It actually needs to be read and reset daily by the operator AIUI.

Victor Venema said...

That is also how I understand it. An American colleague is happy to show us wrong.

That makes the MMTS very different from modern AWS, which have a memory (for many years of data), can be fuelled by batteries, solar or wind, transmit the reading digitally over long distances and if necessary by radio. These early US AWS made the siting worse, but modern AWS are so much more flexible and independent of the observer that they may well have made the siting better and and introduced a cooling bias due to this relocation. Modern AWS technology have made some of the remote stations of the USCRN possible.

Evan Jones said...

The principle objections to the 2012 release were TOBS-bias, moves, and MMTS adjustment. All of these issues have been addressed.

William Connolley said...

I notice that WUWT says "We are submitting this to publication in a well respected journal". Presumably they won't make the same mistake they made last time, of actually publishing a draft, so that people can read it and point out the errors.

Victor Venema said...

We also discussed the points I raised above at HotWhopper and Stoat.

HotWhopper:
HotWhopper Competition: Best Name for a Denier Lobby Group (in 25 words or less) - the discussion began in the comments

Heat sinking, temperatures rising in the US of A - this article was specifically to allow continued discussion by Evan Jones of the US surface station analysis.

Stoat

ehak said...

Evan Jones writes this on WUWT:

"This is important, because it supports our hypothesis: Poor microsite exaggerates trend. And it doesn’t even matter if that trend is up or down.

Poor microsite exaggerates a warming trend, causing a divergence with well sited stations. Poor microsite also exaggerates a cooling trend, causing an equal and opposite divergence."

Makes sense. Heat sinks.

But there is a big problem with this: They find the opposite. After the 80ies the temperature dropped. That would give a bigger cooling of the worst sited stations than the best sited. Instead temperature from the best sited stations dropped more then the worst sited. And stayed lower thereafter.

Too bad.

Victor Venema said...

ehak, exactly an object with a large heat capacity would make the temperature differences smaller and the trend in the 80s and 90s would have been smaller if the heat capacity would have been responsible.

A "heat sink" that is large enough to change the temperature on decadal scales would be much stronger on daily time scales. You would get nearly no daily and seasonal cycle. However, the there is a daily cycle and the seasonal cycle of the USHCN and the USCRN are similar.

The only object with a sufficiently large heat capacity would be the soil, which,however, is present in all categories. Over the oceans the temperature trend is smaller than over land. Half of this is due to the large heat capacity of the ocean (and the higher thermal conductance). (The other half because more of the addition heat available due to an enhanced greenhouse effect goes into evaporation over the ocean, rather than heating the air.) The example of the ocean also clearly shows that the thinking of Watts et al. goes into the wrong direction.

Victor Venema said...

Evan, you may enjoy science being a war gamer, but you should up your game. A war gamer should anticipate the moves of the opponent. In the same way a good scientist should anticipate the critical questions of the colleagues and peer reviewers.

Even if I had not pointed out the above weaknesses before, they would be weaknesses you should have anticipated. Any reviewers of your manuscript will have a different expertise from me and will likely also come up with different questions. You are supposed to be the expert on the topic of your study and anticipate these objections.

The difference is that science is not a game against scientists, but that we all would like to understand nature better. Nature does not enjoy winning a game, nature just is.

Steve Bloom said...

Victor, should Leroy (2010) be taken at face value for assigning specific biases to existing sensors rather than for siting purposes?

Let's say you've got an air conditioner a couple meters distant from a sensor. When does it operate? The exhaust would tend to rise, so without knowing the exhaust characteristics how would one know it even affects the sensor?

I recall that at one point there was a big kerfuffle at WUWT over a sensor located at the edge of a parking lot at UA Tucson. That would definitely not do well on the Leroy scale. The trick was that the typical terrain outside Tucson is bare sand and rock with thermal characteristics quite similar to... a parking lot. I expect the people who did the siting knew that.

Kevin O'Neill said...

As I wrote back in September of last year at Stoat's referencing Hubbard & Lin, Air Temperature Comparison between the MMTS and the USCRN Temperature Systems, AMS 2004:

“In general, our study infers that the MMTS dataset has warmer maxima and cooler minima compared to the current USCRN air temperature system.” Hubbard & Lin, Air Temperature Comparison between the MMTS and the USCRN Temperature Systems, AMS 2004

From this I told Evan that his results sounded suspiciously similar to the results of Hubbard & Lin. At a first glance, it looks like Watts et al have now shown the MMTS Bias I adjustment is too small.

Mal Adapted said...

VV: "The difference is that science is not a game against scientists, but that we all would like to understand nature better. Nature does not enjoy winning a game, nature just is."

This. Science is a way of not fooling yourself. Watts and his cronies, being science-deniers, are only looking for clever ways to fool themselves. E pur, si riscalda.

Paul S said...

If I understand the table correctly it indicates that the NOAA-adjusted average of the Class 1/2 Stations designated "Unperturbed" produces a trend higher by more than 50% than for those same stations with RAW+MMTS.

The obvious conclusion is therefore that the NOAA algorithm believes these are not in fact "Unperturbed" at all (i.e. the metadata is incomplete) and stepwise adjustments have been made accordingly to some/most/all those records. Having looked through the Berkeley Earth site at individual stations it seems unlikely too many would be genuinely unperturbed beyond what the metadata says.

Watts et al. argue that the NOAA algorithm is adjusting the good station data to match the spurious warming in the bad stations. However, if it is a gradual change as they contend this argument doesn't seem plausible given that AIUI the NOAA algorithm is based on breakpoints, not trends. I would hope if they're going to make this claim that there is some demonstration of the NOAA algorithm exhibiting such behaviour.

Anonymous said...

"The study tries to reduce this problem by creating a subset of stations that is unperturbed by Time of Observation changes, station moves, or rating changes. At least according to the station history (metadata). The problem is that metadata is never perfect."

They didn't use only meta-data:

1. Comprehensive and detailed evaluation of station metadata, on-site station photography, satellite and aerial imaging, street level Google Earth imagery, and curator interviews have yielded a well-distributed 410 station subset of the 1218 station USHCN network that is unperturbed by Time of Observation changes, station moves, or rating changes, and a complete or mostly complete 30-year dataset

Seems you ought to have pointed that out the interview part of it, which could reduce, possibly eliminate your concern about station moves.

Victor Venema said...

I would see the interviews as producing metadata. That they did this is good, good metadata is important for any study, but does not change my current assessment. In fact it fits well to there being nearly no difference for the last decade, where the memory of the observer is still good and less well for the first two decades where the memory of the observer is less good.

willard said...

Seems that Willard Tony's press release does not meet his own "science by press release" standards:

http://judithcurry.com/2015/12/17/watts-et-al-temperature-station-siting-matters/#comment-752379

Pierre-Normand said...

Paul wrote: "Watts et al. argue that the NOAA algorithm is adjusting the good station data to match the spurious warming in the bad stations. However, if it is a gradual change as they contend this argument doesn't seem plausible given that AIUI the NOAA algorithm is based on breakpoints, not trends. I would hope if they're going to make this claim that there is some demonstration of the NOAA algorithm exhibiting such behaviour."

That's a good point. I had suggested on Curry's blog that one could attempt to homogenize the "well-sited" station among themselves in order to detect the break-points and assuage any fear of trend contamination (from badly-sited stations), even though I thought this fear might be unjustified. Victor gave reasons why it might no be practicable to do so with just this small subset of stations and still properly detect the break-points (if I understood him). But it ought to be sufficient lay on the skeptics the burden of proving that "trend contamination" rather than (or in addition to) break-point detection causes adjusted data from well-sited stations to acquire a higher warming trend.

peter azlac said...

Victor Venema said...
“ehak, exactly an object with a large heat capacity would make the temperature differences smaller and the trend in the 80s and 90s would have been smaller if the heat capacity would have been responsible.

A "heat sink" that is large enough to change the temperature on decadal scales would be much stronger on daily time scales. You would get nearly no daily and seasonal cycle. However, the there is a daily cycle and the seasonal cycle of the USHCN and the USCRN are similar.

The only object with a sufficiently large heat capacity would be the soil, which,however, is present in all categories “

Victor, you claim that only the soil has the heat capacity to make a temperature difference and that this can be eliminated because it is present in all categories of the sites. However, this overlooks the fact that soils differ in their water holding and release capacities and hence heat capacity and evaporative cooling capacity – sandy soils heat up more from direct solar radiation and cool faster (i.e. deserts) whereas loams soils with a high water capacity linked to pore structure and carbon content retain more water and so retain more heat that affects maximum nighttime temperatures and release water more readily for evaporative cooling – depending on clay content. The level of cooling depends on wind speed and the relative humidity of the atmosphere. It is well established that the thermometer temperature values can be linked to these effects, with higher minimum and lower maximum values. But it does not end there as the water capacity depends on precipitation and as we know that never constant over time at any site or region. The end result is that we can have two or more stations within kilometers of each other on similar or dissimilar soil types that display different temperature trends, that BEST would determine are discontinuities due to some site move or instrument effect but in reality reflect the above facts. That is why, in my book, homogenization and kriging may be considered by some to be valid measures using statistics but they do not accord to reality in many if not most circumstances as one can only compare site records if one has all these essential facts.
That the above is true can be demonstrated from the Class A Pan Evaporation data that meets the IPCC hypothesis that an increase in surface temperature due to increased atmospheric CO2 should cause a greater rise in temperature due to evaporation of surface water. Yet globally no such effect is found with the only difference being between areas of high and low wind speed. Clive Best has shown a significant effect of differing soil moisture levels on MAST values and diurnal temperature range. In addition there is a significant effect of elevation such that homogenization between stations at differing elevations will suffer from both of these factors. This can be seen in the data from the Reynolds Range:
http://hydrology.usu.edu/Reynolds/documents/climate.pdf
This study is interesting because, unlike most Met sites, these have Class A Pan Evaporation units and measured wind speed, thus meeting the need for a measurement area with constant heat capacity and giving a direct measure of solar input and DLR from evaporative losses.

Brandon R. Gates said...

Victor,

Over at Judith Curry's you wrote:

http://judithcurry.com/2015/12/17/watts-et-al-temperature-station-siting-matters/#comment-752281

"Watts et al. (2015) does apply MMTS corrections. They were computed by comparing the stations with these transitions to their neighbors. Thus it seems as if Watts et al. (2015) accepts the homogenization principle sometimes."

I'm not sure, for over at WUWT, Evan Jones reponds in comments:

http://wattsupwiththat.com/2015/12/17/press-release-agu15-the-quality-of-temperature-station-siting-matters-for-temperature-trends/comment-page-1/#comment-2101976

"We do this by applying the Menne (2009) offset jump to MMTS stations at the point of conversion (0.10c to Tmax, -0.025 to Tmin, and the average of the two to Tmean). We do not use pairwise thereafter: We like to let thermometers do their own thing, inasmuch as is consistent with accuracy."

If Menne (2009) describes their method as an across the board adjustment at the point of equipment stage, I cannot find it. Then I go to Hubbard and Lin (2004) and find that MMTS I bias is apparently a function of a number of site-specific parameters which they attempt to model by using pairwise analysis.

Something doesn't add up here.

Kevin O'Neill said...

Brandon, I think you've misread Hubbard & Lin (2004) or have it confused with another paper. In Air Temperature Comparison between the MMTS and the USCRN Temperature Systems, Hubbard & Lin describe two recommended adjustments - MMTS Bias 1 and MMTS Bias II.

MMTS Bias I is a temperature dependent adjustment to allow for the physical response curve of the Dale Vishay 1140 thermistor. The 5 coefficient polynomial function they describe should be applied to the individual MMTS raw temperatures

MMTS Bias II adjustments are from solar radiation and windspeed effects.

Neither of these adjustments is based on pairwise homogenization. Hubbard & Lin ran a 1 year, side-by-side field study. So their results are based on data collected from MMTS units and USCRN PRTs and USCRN thermistors co-located at the Univ. of Nebraska Horticulture Experimental site.

Rattus Norvegicus said...

Kevin,

I think it is actually Watts, et. al. who got it wrong as the critical part of Brandon's post seems to be a post from Evan Jones. Big ooops to Watts. Figures.

Brandon R. Gates said...

Rattus, It's possible that both Watts and I are wrong about Hubbard and Lin ... which is fine by me, I like being corrected. :)

Kevin, thanks for the clarification I will review the paper again with your words in mind and try to figure out where I went off track.

Victor Venema said...

Brandon, as far as I understand it the MMTS corrections of NOAA were computed using the relative homogenization principle by looking at the difference between a pair a stations before and after the change to MMTS. They then averaged all these values to get a correction they apply to all stations where the MMTS has been introduced.

This correction would thus include both the change in instrument and the typical influence of the change in siting. The articles of Hubbard and Lin study this problem by making observations with multiple measurement set-ups at one location. As well as possible, this thus represents the change in instruments, but not any systematic changes in the siting, as Kevin rightly states.

There was a study where NOAA did not apply the time of observation (TOBS) corrections and then showed that the pairwise homogenization algorithm took care of the TOBS corrections for the lower US average. They still apply the TOBS corrections because they are expected to be more accurate at the local scale.

It would be interesting to do a similar study for the MMTS corrections and see if the pairwise homogenization algorithm would be able to correct them just as well. Because the MMTS corrections are fixed for all of the US, I could imagine that the pairwise could get locally better estimates, because you are right that they depend on the local climate, especially insolation and wind. However the MMTS correction is rather small. Thus I would not be sure that it could always be reliably detected with the pairwise homogenization method. In that case the explicit MMTS corrections may thus get better (national) results.

Victor Venema said...

Kevin, such parallel measurements with instruments measuring side by side are great to understand the physical reasons for the differences. Because of the influence of the local climate, you need a large number of such parallel measurements to get a reliable estimate of the influence of such a change in the observation methods on the climate record for a country, continent or the globe.

Within the ISTI we are trying to build such a large dataset of parallel measurements, to be able to estimate such biases on a global scale. We already have datasets from 10 countries. Mainly Europe and South America, where the lead author, Enric Aguilar, has good contacts. Data from the USA is hard to get, there are a number of papers on this topic coming up and we can unfortunately only get the data afterwards.

More data and local expertise is welcome. People who contribute data are invited to contribute to the articles and co-author them.

Victor Venema said...

peter azlac said: Victor, you claim that only the soil has the heat capacity to make a temperature difference and that this can be eliminated because it is present in all categories of the sites. However, this overlooks the fact that soils differ in their water holding and release capacities and hence heat capacity and evaporative cooling capacity...

The main reason to "eliminate" it would be that is goes in the wrong direction.

The more generous response would have been that Watts et al. (2015) is indeed wrong to blame "heat sinks", but that also surface fluxes can be important. Could be, but would be a very different situation. Those are the reasons that are responsible for a large part of the Urban Heat Island (UHI). This is a normally mostly a regional effect, but Watts et al. (2015) explicitly claim that micro-siting, not urbanization, is the problem. They could naturally be wrong. I have not seen the evidence, the manuscript the press release is based on is secret.

Surface fluxes could play a role. The trend over land is twice as large as over the ocean. The ocean has a huge heat capacity, however, and due to turbulent mixing a much higher heat conductance. The ocean has taken up 90% of all warming, while the soils have taken up less than 1%. One would thus expect that the influence of differences in soil properties has an interesting, but not too large effect.

Soil properties, land use and other regional climate influences complicate statistical homogenization. The basic assumption of statistical homogenization is that the reference stations have the same regional climate signal as the candidate station to be homogenized. At a local scale the climatologists knows his climate and can assess this reasonably well, but all homogenization methods for huge global datasets only use the statistical properties of the series (correlation, co-variance). This is one reason why national datasets are to be preferred over global ones, next to the larger number of stations normally available at a national scale, which makes the signal to noise ratio better and by itself makes the stations climatologically more similar.

Brandon R. Gates said...

Victor,

"It would be interesting to do a similar study for the MMTS corrections and see if the pairwise homogenization algorithm would be able to correct them just as well."

I had assumed that they were on the basis that the bias is dependent on local conditions. The balance of your post answers my questions, thanks.

Kevin O'Neill said...

The other factor that Watts et al don't address is a comparison of their results to the USCRN data. The USHCN seems to correlate very well with the USCRN. If (as I suspect is the case) their subset also shows a disparity with USCRN, then what are they left with? For starters, the whole 'heat sink' hypothesis gets thrown out the window - since it wouldn't explain any difference with USCRN.

Paul S said...

On MMTS, wouldn't this technology change often occur along with other siting changes? Is the adjustment only meant to relate to the sensor bias?

Victor Venema said...

The NOAA MMTS adjustment corrects both for the average effect of the instrument change and the average temperature change due to changes in siting.

Such a correction is good enough for the national temperature average, but once you start splitting the data in different categories for siting, this average correction may no longer be sufficient. That would be something to study.

A purely statistical result is not worth much. You need to understand the reason(s) for the differences or at least rule out the main uninteresting reasons to leave an interesting puzzle for scientists.

Peter Thorne said...

Victor,

only the TOBS is done outside of (prior to) PHA in the NCEI algorithm. The MMTS transition effects are calculated uniquely for each station based upon a median selection of a range of pairwise apparently homogeneous station comparisons. A certain number of such segments are required. A number of the parameters changed are discussed in Williams et al., 2012 and include this adjustment effect. PHA can also do TOBS but as its a square-wave filter it doesn't catch the seasonal effects of TOBS (or by extension MMTS or any other effect). The issue of seasonal adjustments is they become incredibly noisy.

Victor Venema said...

Ups, stupid mistake of me. You are right. Please ignore my comments on the MMTS adjustments above.

There was a fixed correction in USHCNv1, but since USHCNv2 the pairwise method is used. Menne et al. (2010; On the reliability of the U.S. surface temperature record): "In short, the “underadjustment” in maximum temperatures is a consequence of using site‐specific adjustments for the MMTS in the version 2 release as opposed to a network‐wide, fixed adjustment as in version 1 [Quayle et al., 1991]"

That then leads to the question: how did Watts et al. (2015) make the MMTS adjustments? They write in their press release: "We do allow for one and only one adjustment in the data, and this is only because it is based on physical observations and it is a truly needed adjustment. We use the MMTS adjustment noted in Menne et al. 2009 and 2010 for the MMTS exposure housing versus the old wooden box Cotton Region Shelter (CRS) which has a warm bias mainly due to pain and maintenance issues."

The size of every adjustment is listed in the output of the pairwise homogenization method and Watts et al. (2015) could have selected the one that is nearest the date of the transition to MMTS. Would be ironic when they would use the pairwise homogenization method that they disapprove of. Alternatively, it would be possible that they used the averages reported in Menne et al. (2009, 2010). Then my last comment would still be valid for this study.

chrisd said...

In point 1 of the "six key findings":

That the perturbed stations have a smaller trend that the perturbed stations...

Think that second one is supposed to be "unperturbed" (and "that" for "than").

Victor Venema said...

Chris, thanks. Corrected.

Should probably proof read the piece in full again. A midnight action. Normally I let a post rest a few days before publishing.

Kevin O'Neill said...

Victor - Regarding the MMTS adjustment in Watts et al: Evan has said in the comments over at WUWT, "We do this by applying the Menne (2009) offset jump to MMTS stations at the point of conversion (0.10c to Tmax, -0.025 to Tmin, and the average of the two to Tmean).”

To get this correct - even if you know the date of conversion from CRS to MMTS - I believe you either have to homogenize or use Hubbard & Lin, 2004. And then, once you get the step-change correct, you pretty much have to continue forward using one method or the other. Otherwise the temperature dependent systematic bias in the MMTS is going to create different errors for different locations.

I have also pointed out to Evan that using an offset based on the entire population and assuming it is applicable to a special subset of that population is a leap of faith.

Victor Venema said...

If they refer to this Menne et al. (2009) (Open Access; there was no reference list on the press release/blog post), then the numbers do not add up and the adjustments for the MMTS transition should be much larger.

Most of this asymmetry appears to be associated with documented changes in the network (Fig. 6e) and, in particular, with shifts caused by the transition from liquid-in-glass (LiG) thermometers to the maximum–minimum temperature system (MMTS; Fig. 6g). Quayle et al. (1991) concluded that this transition led to an average drop in maximum temperatures of about 0.4°C and to an average rise in minimum temperatures of 0.3°C for sites with no coincident station relocation. [These averages were subsequently used in version 1 to adjust the records from HCN stations that converted to the MMTS, primarily during the mid- and late 1980s (Easterling et al. 1996).] More recently, Hubbard and Lin (2006) estimated a somewhat larger MMTS effect on HCN temperatures and advocated for site specific adjustments in general, including those sites with no documented equipment move.

Notably, the pairwise algorithm in HCN version 2 allows for such site-specific adjustments to be calculated for all types of station changes. The subsets of changes associated with the conversion to the MMTS are shown in Figs. 6g and 6h [VV: -0.52 for Tmax and +0.37 for Tmin, which gives -0.15 for Tmean]. The pairwise results indicate that only about 40% of the maximum and minimum temperature series experienced a statistically significant shift (out of ~850 total conversions to MMTS). As a result, the overall effect of the MMTS instrument change at all affected sites is substantially less than both the Quayle et al. (1991) and Hubbard and Lin (2006) estimates. However, the average effect of the statistically significant changes (−0.52°C for maximum temperatures and +0.37°C for minimum temperatures) is close to Hubbard and Lin’s (2006) results for sites with no coincident station move.

Kevin O'Neill said...

Victor - Yes, I've read (and quoted) that same section from Menne to Evan. I really don't know where his numbers are coming from. I believe the last sentence is particularly applicable since it is specifically for those MMTS with no station moves - which should be nearly identical to the Watts et al subset.

I almost think Evan's talking about trend again and citing degrees C per decade. Remember his weird Y-axis.

Brandon R. Gates said...

Victor,

"There was a fixed correction in USHCNv1, but since USHCNv2 the pairwise method is used."

Aha, now we are getting somewhere. Thank you for the correction and update.

Evan Jones said...

I notice that WUWT says "We are submitting this to publication in a well respected journal". Presumably they won't make the same mistake they made last time, of actually publishing a draft, so that people can read it and point out the errors.

That wasn't a mistake. It was the objective. Youse guys "pointing out the errors" was invaluable to this study. I said that quite plainly over on Stoat, and i really, truly meant it. And thanks for all your help and reasonable discussion.

The current paper in is its final tweaking phase. Then the real fun begins. So we are releasing our results and discussing our methods. Any chipping will necessarily be around the edges (unless you-all were holding out on us, criticism-wise, which I doubt), and that will occur when we publish and archive our data.

Evan Jones said...

Hullo, VeeV, a pleasure, as always.

Evan, you may enjoy science being a war gamer, but you should up your game. A war gamer should anticipate the moves of the opponent. In the same way a good scientist should anticipate the critical questions of the colleagues and peer reviewers.

And what do you think I have been up to all this time? I said it right out, any number of times. Of course it may be in a manner of which you do not approve.

And the fact (which we heartily endorse) that the adjusted Class 1\2s match those of the 3\4\5s is the point, isn't it?

Which set was adjusted and in which direction? The trend well sited minority of stations was adjusted to match the trend of the poorly sited.

That is a clear fingerprint of homog erroneously adjusting good to bad because there is a systematic flaw (microsite bias) in the poorly sited stations and there are so many more of them.

I continue to encourage you to do whatever you do to the black box and get this going this in the right direction rather than the wrong one. Or else apply a siting adjustment to the poorly sited stations before you homog.

Even if I had not pointed out the above weaknesses before, they would be weaknesses you should have anticipated.

Naturally. Although those "weaknesses" were most definitely not seriously addressed in peer review for Fall et al., 2011, which, after all, had a far more prosaic conclusion. It's even cited a few dozen times.

At any rate, we have done our best to make sure that the stations we retained did not move and the microsite rating did not change midway through the study. We carefully checked the excellent, greatly improved USHCN metadata for moves (some of which were given in increments less than 10 feet), and I looked at the GE wayback machine, to the extent posiible, to see if the site conditions had changed. There were a couple of cases, and we treated them as a move and dropped them.

All in all, it is remarkable how generally consistent the microsite has been over time for an unmoved station. Encroachment was less than we expected, especially for the Class 1\2s.

At any rate, we dropped two thirds of initial sample, mostly because of TOBS and moves from unknown locations to current locations.

I am sure it's not perfect, but you are nibbling around the edges here, not striking at the heart of the issue.

Any reviewers of your manuscript will have a different expertise from me and will likely also come up with different questions. You are supposed to be the expert on the topic of your study and anticipate these objections.

Well, in the case when you see a dead body on the floor, you usually don't need a degree in medicine to determine that.

We failed to land our physicist. So the formulas will have to wait for a follow-on. But we do describe in words what we think is going on, and made a formulaic version of our main hypothesis. This is a work in progress, much yet to be done. This is the initial observation phase. And, oh, what observations! We'll adress more detail down the road, when we can.

The difference is that science is not a game against scientists, but that we all would like to understand nature better. Nature does not enjoy winning a game, nature just is.

You got that right. It is a game between scientists. And the ultimate result, win or lose, is a better understanding of nature by all players. But why am I telling you? You know this better than I do.

Evan Jones said...

MMTS conversion a relatively minor factor for us, as Class 3\4\5s are affected by it more than the Class 1\2s.

And, anyway, when it comes to MMTS conversion, I think you-all are shooting the wrong horse, and I have my reasons.

Kevin O'Neill said...

Evan - "That wasn't a mistake. It was the objective. "

Oh c'mon. We've been through this before. You don't issue a Breaking News post, alert the media days in advance, cancel vacation, etc to put out the draft of a pre-print to solicit feedback.

That excuse is nothing but rationalization for the fact the original paper was not ready for prime time. It wasn't even ready for the timeslot filled by infomercials on obscure cable channels.

If you wanted real feedback *now* you'd release at least some of the data - even 20 or 30 stations showing the raw data, your adjustments and the NOAA adjustments. Then people could actually give you legitimate feedback.

Even if I assume that everything you've done is correct, I'd still say this is the worst handling and presentation of a scientific result I've ever witnessed.

William Connolley said...

> The current paper in is its final tweaking phase.

Mmmm, but what you've skated over - but not subtly; it's obvious - is that this time you're not publishing the draft paper. AW does his paranoia schtick about journals, but that doesn't justify or explain hiding the paper.

PaulS said...

Evan,

Which set was adjusted and in which direction? The trend well sited minority of stations was adjusted to match the trend of the poorly sited.

Homogenisation is performed by instantaneous adjustment at detected breakpoints, not by adjusting trends. Your Point 6 states that the divergence is gradual, not by step change. Therefore, by your own argument, the increased trend cannot be related to homogenisation.

Evan Jones said...

Oh c'mon. We've been through this before. You don't issue a Breaking News post, alert the media days in advance, cancel vacation, etc to put out the draft of a pre-print to solicit feedback.

The decision to pre-release had already been made some time earlier. The timing had everything to do with BEST. But that was not the reason, it was the reason for the timing.

That excuse is nothing but rationalization for the fact the original paper was not ready for prime time. It wasn't even ready for the timeslot filled by infomercials on obscure cable channels.

You're still not getting it. We wanted to make sure it was ready for prime time. And we listened, and fix the errors. Bottom line. And that should be held as evidence for our intent.

If you wanted real feedback *now* you'd release at least some of the data - even 20 or 30 stations showing the raw data, your adjustments and the NOAA adjustments. Then people could actually give you legitimate feedback.

We don't need data feedback now (or did then). What we needed was a robust discussion of method. That has been and is being discussed. Here, for example.

Even if I assume that everything you've done is correct, I'd still say this is the worst handling and presentation of a scientific result I've ever witnessed.

Let me put it this way. The climate scientific community is like an army mule: It is an amazingly powerful tool. But you have to get its attention, first. And that requires a judiciously applied two-by-four. I got no apologies and no regrets. You don't like it. I get that. But as for all these veiled requests to slowly fade away? Unable to comply. Sorry.

When we release, we all get to find out how close to accurate it is (it can't possibly be 100% correct). Mileage may vary. I will be around to discuss it in any case.

willard said...

> It is a game between scientists.

It can also be seen as a game between scientists and nature, where nature holds all the facts and scientists all the logic. This goes back to Hinttikka:

http://plato.stanford.edu/entries/logic-games/#SemGam

One advantage I see of doing this the scientific way is that we can dispense ourselves from hexagons. Are wargames still played on cheap cardboard hexes, Evan?

Kevin O'Neill said...

Evan,

1) Your study period is 1979 - 2008.
2) MMTS field installation began in 1983.
3) Field installation was complete by the mid-90s.
4) Every station in the network was converted during your study period.
5) MMTS conversion introduced a systematic error.
Therefor, every station record in your study contains a systematic error.

Let me repeat: *EVERY* station record in the USHCN dataset that was converted during the study period has a systematic bias due to the conversion.

The question has always been: How do we determine which data are significantly affected by this bias and what adjustment do we apply to the data? The stations you've segregated and concentrate on are the stations that (logically) require the *full* MMTS adjustment. That full adjustment is best addressed by Hubbard and Lin - not Menne.

By applying your interpretation of Menne, a simple offset at time of conversion, you understate the effect and lose all daily/monthly/seasonal accuracy. Because the bias is temperature dependent its effect on Tmean can be 0.3 degrees in some months and indistinguishable from zero in others - with an average of more than 0.16 degrees over the full year. And this is just for the location they studied in Nebraska. Not all stations have climate normals identical to the experimental site - their variations will depend on their temperatures. So we can add spatial distortion to the temporal distortion you've introduced.

And of course there's the elephant in the room you've never addressed: USCRN.







Victor Venema said...

Every station that is MMTS now has transitioned in the study period. That is also why Watts et al. (2015) makes the MMTS correction for their "raw" data. Otherwise they would only have the Cotton Region Shelter data, which inconveniently has a very strong trend of 0.3°C per decade. And about the same trend as the adjusted data, which is inconvenient for the trash talk about homogenization at WUWT.

Kevin are you sure that the transition to MMTS is finalised? Aren't there still some stations with , e.g., Cotton Region Shelters? Or that use other Automatic Weather Stations?

Kevin O'Neill said...

Victor - I thought the transition to MMTS was completed in the mid 90s from one of the COOP newsletters. But thinking back on Menne's data he still showed approximately 218 CRS in use.

I believe all the others though would be MMTS.

willard said...

Seems that Evan's "seriously arguing that you showed homogenization to be wrong without studying how homogenization methods work, but only on the basis of two numbers looking similar," VeeV:

> The similarity of the two numbers is a scathing challenge to homogenization as currently practiced. It is a stereotypical fingerprint of systematic error. VeeV is chief boffin of homog. Therefore and explanation is called for. (Yelled for?)

http://judithcurry.com/2015/12/17/watts-et-al-temperature-station-siting-matters/#comment-753698

I hope you don't mind be yelled at, VeeV, and I suppose it's alright with you to be called "VeeV."

Evan Jones said...

Greetings, VeeV. (Hullo, willard the Peacemaker.)

I will answer a few questions as well as I can.

Victor - I thought the transition to MMTS was completed in the mid 90s from one of the COOP newsletters. But thinking back on Menne's data he still showed approximately 218 CRS in use.

Sounds about right. But it also sounds like that number includes the stations that are long-closed but whose data is still used (e.g., Electra PH, CA, which closed in 1994 but is still part of USHCN). I would guess a bunch have gone MMTS since 2008.

I believe all the others though would be MMTS.

There are ~60 ASOS/AWOS setups (almost) exclusively confined to airports. TOBS isn't a problem; they observe at 24:00, almost without exception. They actually record each hour, and NOAA has that data, I've seen some of them "blue-seal" reports. But they do the max-min thing for USHCN purposes.

Of these about two thirds are perturbed, owing to moves.

There is a handful of "other" equipment, often Davis Pros (e.g. Atlantic City, NJ). maybe a dozen.

Victor Venema said...

Willard, a strange idea that I should solve their problem while not getting access to the manuscript and the data. Plus, I work on homogenization methods, that does not make me an expert on the changes in observational methods in that country on the other side of the Atlantic that has all those problems with mitigation sceptics.

Victor Venema said...


Evan Jones said: "That wasn't a mistake. It was the objective. You guys "pointing out the errors" was invaluable to this study. I said that quite plainly over on Stoat, and I really, truly meant it. And thanks for all your help and reasonable discussion."

Maybe you mean it, but I am less convinced your boss means it. If it was helpful in 2012, it would make sense to do the same again now and post the manuscript.

Both is fine by me, sometimes I put a manuscript on my homepage when it is my core expertise and I am confident it is right and, sometimes I do not when I am less sure and hope that the reviewers may protect me against making stupid errors. I would say that it is not elegant to publish a press release when there is not even a manuscript available, better would be a peer reviewed article.

Steve Bloom said...

EJ: "We don't need data feedback now (or did then)."

Except your data involves a facially erroneous application of Leroy (2010), which you were told about at the outset (referring then to the earlier version of Leroy) and chose to ignore since without it your whole project evaporates in a puff of phlogiston, so it's actually a question of wanting rather than needing.

Victor Venema said...

Evan Jones said: "And the fact (which we heartily endorse) that the adjusted Class 1\2s match those of the 3\4\5s is the point, isn't it?
Which set was adjusted and in which direction? The trend well sited minority of stations was adjusted to match the trend of the poorly sited.
"

Statistical homogenization using neighbouring stations does not work they way you think it does. Homogenization in the USA increase the trend by 0.4°C per century. If homogenization would work the way you think it does, you should be able to find more than 50% of stations were the trend is the same before homogenization and a large minority where the trend even shows cooling.

It is perfectly possible for all stations to have a trend bias in the same direction and still improve the trend estimate.

The basic idea of statistical homogenization is explained here.

Evan Jones said: "Although those "weaknesses" were most definitely not seriously addressed in peer review for Fall et al., 2011, which, after all, had a far more prosaic conclusion."

The reason these weaknesses were not relevant for Fall et al was that, as you mention, the conclusions were not based on the weakest part of the study. You could naturally draw similar conclusions and add that the new Leroy classification seems to work better than the old one.


Victor Venema said...

Evan Jones: "At any rate, we have done our best to make sure that the stations we retained did not move and the microsite rating did not change midway through the study. We carefully checked the excellent, greatly improved USHCN metadata for moves (some of which were given in increments less than 10 feet), and I looked at the GE wayback machine, to the extent possible, to see if the site conditions had changed. There were a couple of cases, and we treated them as a move and dropped them."

Did I get it right that when there is a relocation, but the category does not change, that you treat this a "unperturbed"? If there is a change in the surrounding that does not change the category, do you also treat this as "unperturbed"?

Did I get it right that if the time of observation changes only a few hours, but not from AM to PM or visa versa, that you treat this a "unperturbed". As well as the cases where the time of observation is the same at the beginning and end, but may be different in the middle?

What is the starting date for the Google Earth Wayback machine?

Victor Venema said...

Steve, how was the sting classification of Leroy applied in the wrong way?

Evan Jones said...

Did I get it right that when there is a relocation, but the category does not change, that you treat this a "unperturbed"? If there is a change in the surrounding that does not change the category, do you also treat this as "unperturbed"?

Yes. But only localized. And in the overwhelming majority of cases, we do not know the previous location, so we dropped it. (That was the reason most stations were dropped.)

Did I get it right that if the time of observation changes only a few hours, but not from AM to PM or visa versa, that you treat this a "unperturbed". As well as the cases where the time of observation is the same at the beginning and end, but may be different in the middle?

Yes, but Dr. N-G is looking for stations with larger discrepancies between raw and TOBS data, and we may drop a few more. Also, he ran a TOBS-adjusted version awhile back, and the results were much the same. You could drop that data in if you like it better. It's all very flexible.

Our Primary information sheet (to be archived) has a column with all TOBS changes, and if you think a station should be dropped, just drop it. I will provide Excel sheets of the data and methods, so a person can simply snip out or add in as desired, for purposes of review.

What is the starting date for the Google Earth Wayback machine?

Not all the way back to 1979, but they keep adding new stuff. You can usually take it back to the original black-and-whites of the mid-90s. It's amazing how little change there is. We lost a couple of stations on account of that, but far fewer than I had expected.

Evan Jones said...

a strange idea that I should solve their problem while not getting access to the manuscript and the data.

The help was with method and approach. That's what we were after. The rest is edge-work.

Except your data involves a facially erroneous application of Leroy (2010), which you were told about at the outset (referring then to the earlier version of Leroy) and chose to ignore since without it your whole project evaporates in a puff of phlogiston, so it's actually a question of wanting rather than needing.

---------------------------------------------------------------------

Steve, how was the sting classification of Leroy applied in the wrong way?

Oh, he means that we only use the heat sink part of the rating. Not the shade or grass length or slope. Stuff we've already discussed.

We may look at those down the road, but they won't affect much. Shade might have a minor effect on Tmax, but in most cases, the shade is cast by the obstruction itself, so it is not an independent variable, anyway.

So how would that cause the whole project to evaporate in a puff of phlogiston? In any case, if you want to look at those aspects when we publish and drop more stations, well, go right ahead.

But it's just more nibbling around the edges. Won't change the results. You would have to find a factor in there that will produce a systematic data error. Meanwhile,you are just trying to paper over the elephant in the room with post-its.

BTW, we did it the same way for Fall (2011), but we heard nary a peep from either the review panel or from any of the critics. But then y'all liked those results. Now that we are a near trend-match with both satellite trends, you-all don't like the results, and I am hearing a different tune.

Evan Jones said...

Maybe you mean it, but I am less convinced your boss means it. If it was helpful in 2012, it would make sense to do the same again now and post the manuscript.

Well, Anthony does what he does and I do what I do. In 2012, we were after larger-scale problems. Now we are just finalizing it for submission. And we released far more actual results this time than were included in the 2012 paper.

Evan Jones said...

A "heat sink" that is large enough to change the temperature on decadal scales would be much stronger on daily time scales.


According to Leroy, the direct offset effect is large. Arguably very large. The Yilmaz paper concentrates on the direct offset effect (but, like Leroy, not that of trend).

But as for the rest, I think you have it backwards: The delta creating the trend is small and gradual, so it will only show up on a decadal scale. Preferably a multidecadal one. The reason for this is the same reason why 30-year trends are a general rule of thumb for climate time series.

In a fully controlled closed environment, timescale would not matter so much. But in the field, all the smaller, non-systematic effects come into play, increasing variability the shorter the timescale is.

Put on that wargame design hat that you have in the back of your closet (and tease me a bit for wearing in public). Lots of good system design stuff there in that hat.

On a strategic level, the heat sink effect is nothing "new". Additional heat sink near a sensor merely amplifies what the ground is already doing. You have already noted that ground type affects trend. Well, you are right; it does. And when you alter the character of the ground by adding additional heat sink, that merely amplifies an effect that is already occurring.

If you were right about the triviality of the heat sink effect, then Tmax would occur right around noon and Tmin would occur right around midnight. It is the amount and delta of that lag over time that is producing the divergence.

Our observations do not add a Byzantine mechanism that clashes with the others. All it does is show an amplification of a well known mechanism that has been acknowledged all along. It's all very generic.

Evan Jones said...

Mmmm, but what you've skated over - but not subtly; it's obvious - is that this time you're not publishing the draft paper.

This was only a presentation. But it did include our major findings and our results (until J N-G kills a few more stations in this final round of checking. #B^)

AW does his paranoia schtick about journals, but that doesn't justify or explain hiding the paper.

Well, you would be right if we were intending to keep it hidden. But we are not going to keep anything hidden when we publish.

While not ascribing any blame, we had reason to regret releasing our data not once, but twice. That was data I had a personal hand in compiling. So we will wait until publication. But you will get it. All of it. In a form you can alter in order to test.

Steve Bloom said...

"the heat sink part of the rating"

Not all heat sinks are created equal. You needed measurements of sink efficacy. You don't have them.

Steve Bloom said...

Victor, my general point about the Leroy classifications is that they're not fit for this purpose. Pointing to a potential bias isn't proof that one exists.

Evan Jones said...

From this I told Evan that his results sounded suspiciously similar to the results of Hubbard & Lin. At a first glance, it looks like Watts et al have now shown the MMTS Bias I adjustment is too small.

Or that MMTS is (basically) right and the problem resides not with the MMTS but with the CRS, which carries it's own personal Tmax-enhancing heat sink around with it wherefver it goes.

Even if you doubled our MMTS correction, there would be no material difference in our results. It would affect the Class 3\4\5s more than it would affect the Class 1\2s, anyway.


MMTS-adjustment is chipping around the edges. If you don't like ours, just drop in one that suits you. We'll be providing the spreadsheets for that.

Evan Jones said...

The obvious conclusion is therefore that the NOAA algorithm believes these are not in fact "Unperturbed" at all (i.e. the metadata is incomplete) and stepwise adjustments have been made accordingly to some/most/all those records.

Yes, I quite agree.


Having looked through the Berkeley Earth site at individual stations it seems unlikely too many would be genuinely unperturbed beyond what the metadata says.

If the well sited stations were in the majority, you would be correct. Both BEST and GHCN-style homog would work as intended. The problem is that the large majority are poorly sited, the pairwise is done with mostly poorly sited stations. And homog operates in a manner that does not take an overall average, but adjusts small outlying clumps to conform with majority clumps.

P.S., an unperturbed station can have bad siting. Almost 4 in 5 do. Unperturbed only means the metadata is clean enough (so far as we can tell) to include in the comparison. It is, of course, quite possible that both a BEST method and a homog mode could be made that would account for microsite quality. But at present, neither do so.

Ironically (owing to lag), a straight-up reading at noon and midnight would likely produce more accurate trend results than 24-hour Min-Max, because that would not ride the lags and would produce result less affected by the heat sink delta.

A good confirmation of our hypothesis is that Tmin and Tmax both lag 12:00 AM and PM. Adding additional heat sink to the ground effects simply amplifies what already is.

Watts et al. argue that the NOAA algorithm is adjusting the good station data to match the spurious warming in the bad stations. However, if it is a gradual change as they contend this argument doesn't seem plausible given that AIUI the NOAA algorithm is based on breakpoints, not trends. I would hope if they're going to make this claim that there is some demonstration of the NOAA algorithm exhibiting such behaviour.

You can theorize what the effect would be all you like. But the results are there and they are quite stark. Bottom line, the well sited station trends are being adjusted upward to match those of the poorly sited.

This paper is not a be-all, end-all. It is a part of a continuing process. Now that the discrepancies are shown, there will be a further looking into the details in followup.

Evan Jones said...

Seems you ought to have pointed that out the interview part of it, which could reduce, possibly eliminate your concern about station moves.

They would reduce, but not eliminate the uncertainties.

If they largely contradicted the metadata, that would be of note. But they largely confirm it. So I think the gold speck here is that recent USHCN metadata is pretty darn good for our purposes. (There is no "perfect" here, in any of what any of us on either side of the discussion are doing.)

But what I am seeing is a concentration on net-neural edge-nipping which, while a real issue, begs the question of the radical systematic error in the dataset.

No one is immune to error. No one. We made a systematic data error in 2012 by failing to address TOBS. That reduced our effect by over a third. Moves, however, was a net-neutral hedge clipping and had little effect on our results.

William Connolley said...

> you're not publishing the draft paper.

Thank's for replying, but those aren't actually genuine replies, just evasions. I don't think you're being open or honest with us.

Evan Jones said...

Victor, you claim that only the soil has the heat capacity to make a temperature difference and that this can be eliminated because it is present in all categories of the sites. However, this overlooks the fact that soils differ in their water holding and release capacities and hence heat capacity and evaporative cooling capacity ...

And color, density, etc., etc. I agree, but would add that ground effect is a relevant component for regional temperature, so leaving it in is not so bad. It affects the much larger areas in general. (Though for larger-scale pairwise and kiriging, it is a big issue, possibly unavoidable.)

But failure to account for the increased effect of ground level in the additional presence of poor microsite is not so good. That affects only the immediate area of the sensor, specifically.

Tom Dayton said...

Evan, maybe I've merely gotten lost in the tangle of the posts, comments, and responses, but I believe you have switched back and forth repeatedly from saying:

1) You have no hypothesis of a physical mechanism that would cause both in increase in warming trend and an increase in cooling trend.

2) Large heat sinks are the physical mechanism.

What is your explanation for (2)? I and other people think:

1) A heat sink would in the very very short term decrease a warming trend by sucking the heat out of the air surrounding the station, and decrease a cooling trend by replacing the heat into the air. In other words, it would decrease the variability of the temperature changes.

2) Even that stabilizing effect would operate over a time scale much shorter than multiple years, because the heat sink would become "saturated" so to speak to the new multi-year temperature mean.

willard said...

> [A] strange idea that I should solve their problem while not getting access to the manuscript and the data.

I find it less strange considering that Evan hesitated before stating a claim regarding what you "both know."

Even less strange after seeing how Evan rationalized Willard Tony's "hostile" epithet.

Not that strange if you see how Evan's rehearses "the fox and the crow" by coaxing you into doing his homework by appealing to your reputation,

Not strange at all considering the number of dirty tricks he used in so many of his responses to me.

Actually quite normal for a "science by press release" episode.

Evan's experience in wargames seems conducive of some ClimateBall gamesmanship.

***

Auditors sure would like to know if Evan authored the document entitled "press release":

https://fallmeeting.agu.org/2015/files/2015/12/Press-Release-NEW-STUDY-OF-NOAA-USHCN.pdf

If Evan could compare this press release with Willard Tony's own standard for press release, that would be nice too.

I hope

Evan Jones said...

If Menne (2009) describes their method as an across the board adjustment at the point of equipment stage, I cannot find it. Then I go to Hubbard and Lin (2004) and find that MMTS I bias is apparently a function of a number of site-specific parameters which they attempt to model by using pairwise analysis.

Something doesn't add up here.

Yes. What does not add up here is that Menne appears to have noticed that the main issue of CRS-MMTS is not an offset issue, it is a trend divergence over time. So he uses a pairwise over 15 years to bring the MMTS in line with the MMTS when he should be doing the exact opposite. Even his offset is in question, assuming the calibrators did their job. I think he is using both to horsewhip the MMTS trends in line with the those of the CRS.

He should be doing the opposite.

We should not be adjusting the MMTS trends up. We should be adjusting the CRS trends down. So bite the bullet: warm the past back up a little, then; y'all have cooled it down plenty, after all.

Evan Jones said...

Are wargames still played on cheap cardboard hexes, Evan?

No, most of them are computerized now, to their great advantage. Fog of war being one example of a "systematic" problem now largely if not entirely solved. Another being that you usually can't view the CRT in computer games, so you must just align your play to their past pattern of results.


But I started out in 1970, so I will always have cardboard on the brain. And I still play a lot of the oldies.

Evan Jones said...

If they refer to this Menne et al. (2009) (Open Access; there was no reference list on the press release/blog post), then the numbers do not add up and the adjustments for the MMTS transition should be much larger.

That is because Menne is not only applying an offset, he is also doing 15-year pairwise with the comparisons -- so that affects the data progression long after the point of conversion, and therefore the trend.

To be frank, I think even the adjustment we apply is in the wrong direction (and, note well, against our hypothesis). We should not be adjustubng MMTS Tmax trend (or offest), up, we should be adjusting CRS Tmax down. All the way back to the onset of the instrumental record.

PaulS said...

Evan,

Bottom line, the well sited station trends are being adjusted upward to match those of the poorly sited.

Trends are not adjusted at all. A breakpoint is detected and all datapoints after the breakpoint are offset by the same amount. The trend before and after the breakpoint is unchanged.

You've repeated a belief that the divergence between unperturbed well-sited and poorly-sited stations is gradual. The divergence between the final NOAA adjusted average and your RAW+MMTS average follows basically the same path. Since homogenisation is not a gradual thing then, by your own argument, it cannot be causing trend increases in the well-sited stations. If you believe homogenisation is causing the trend increase relative to raw data in designated well-sited stations (and of course it is) the main thing you need to do is re-examine your claim that divergences in stations are gradual. Currently your arguments are contradictory.

Evan Jones said...

Aha, now we are getting somewhere. Thank you for the correction and update.

Yes. And a big aha is is. It illustrates exactly what I have been going on about.

Evan Jones said...

That excuse is nothing but rationalization for the fact the original paper was not ready for prime time. It wasn't even ready for the timeslot filled by infomercials on obscure cable channels.

If so, then all the more reason to make a big bang and get feedback in order to correct.

Evan Jones said...

1) Your study period is 1979 - 2008.

Yes.

2) MMTS field installation began in 1983.

Late 1982.

3) Field installation was complete by the mid-90s.

Mostly, but by no means all.

4) Every station in the network was converted during your study period.

Yes, except CRS-only stations and ASOS/AWOS.

5) MMTS conversion introduced a systematic error.
Therefor, every station record in your study contains a systematic error.

Why yes. A small one. But not the way you think.


We do apply an offset adjustment to account for this. If you think it is too small, just increase it. Even if it is double what we say, we are talking maybe a 0.01C/decade difference. That makes all this MMTS stuff edge-chipping.


But I am saying there is a different sort of systematic error: The wrong set is being adjusted. And I am not talking offset here, I am talking trend. Menne talks offset but does trend. To the wrong set of stations. I'll be talking and looking at trend, come followup time. And applying the results to the right set of stations.

Evan Jones said...

Homogenisation is performed by instantaneous adjustment at detected breakpoints, not by adjusting trends.

Yes.

Your Point 6 states that the divergence is gradual, not by step change.

Yes.

Therefore, by your own argument, the increased trend cannot be related to homogenisation.

Not so much.

It corrects one datapoint at a time, yes. And the effects accumulate over time. Therefore the increased trend must be due to homogenization. This problem has pairwise written all over it.

But if it isn't what we say, then it's something else. Something else that needs a look, preferably by better minds and better trained eyes than mine. Because the result is. And if all this is not a result of what we think it is, then surely it is the result of something else -- something that it would be important to identify.

Evan Jones said...

Let me repeat: *EVERY* station record in the USHCN dataset that was converted during the study period has a systematic bias due to the conversion.

Why yes. But Size Matters. (The first Law of AGW.) You are talking at most ~ 0.02C/decade, some of which we already do. Our microsite disparity is 0.11. And, as the effect his the poorly sited stations more than the well sited jobs (heavier in CRS and ASOS/AWOS), go wild. But I repeat myself.

You will also realize, of course, that I am contending not that the bias occurs at the point of conversion, but that the bias occurs prior to the point of conversion. Plus whatever offsets the calibration guys missed (and going by the way Menne does his stuff, they would have had to be doing a pretty lousy job).

P.S. VeeV, I just love this blog's feature that does not permit publication if an ital. is not properly closed. Now, that's my idea of a good adjustment.

Evan Jones said...

Not all heat sinks are created equal. You needed measurements of sink efficacy. You don't have them.

Yes, And that is a clear and fair criticism of Leroy. A heat sink is or isn't. That's part of what I mean when I say Leroy (2010) is a bit of a meataxe.

I'd like to quantify it and weight it. Assign it a value, diminishing on distance. Do it over, but with numbers.

Like homogenization, BEST, or any other approach, including ours, there will be additional problems, but probably not greater than those removed.

Evan Jones said...

However, the average effect of the statistically significant changes (−0.52°C for maximum temperatures and +0.37°C for minimum temperatures) is close to Hubbard and Lin’s (2006) results for sites with no coincident station move.

We do Menne's. It's not completely unalike, except that he gives it at -0.10 at Tmax and +0.025 at Tmin., which brings it in line with CRS (which also raises the question of how the initial calibration is so much adjusted in the first place).

Heck, H&L offsets Tmean less than Menne (or we) do.

The devil is in what Menne is doing with his multi-year pairwise to drag the MMTS kicking and screaming in line with the CRS. There are two dogs here: MMTS and CRS. One of them is mad. And Dr. Menne has gone and shot the wrong one.

Evan Jones said...

This. Science is a way of not fooling yourself. Watts and his cronies, being science-deniers, are only looking for clever ways to fool themselves. E pur, si riscalda.

Aye, Cap'n. so it be.

And that is a sword which cuts all ways and always.

Evan Jones said...

Further note:

Except your data involves a facially erroneous application of Leroy (2010), which you were told about at the outset (referring then to the earlier version of Leroy) and chose to ignore since without it your whole project evaporates in a puff of phlogiston, so it's actually a question of wanting rather than needing.

That's not a data question. That is a methods question. Just the sort I was after.

My cut-rate answer is why would you use a minor vegetation metric when what you are measuring is not vegetation. And shade (the largest "other factor" involved) is inextricably mostly bound to (and caused by) the heat sink itself, and has no effect at Tmin, anyway, which is where most of our digression is showing up (even in Fall et al.). Veggies et al. are for followup.

Evan Jones said...

Willard, a strange idea that I should solve their problem while not getting access to the manuscript and the data.

You have solved a lot of them, already. When we publish, and if things stand up even half as much after review, it is not my problem, it is our problem. A soluble one I think, using the tools you already have.

Evan Jones said...

It can also be seen as a game between scientists and nature, where nature holds all the facts and scientists all the logic. This goes back to Hinttikka

And that is what I am doing.

What our team has done is uncover a fact. We are now discussing if it is or is not a fact, and, if it is, how logically to account for it and apply it to existing tools.

Evan Jones said...

Thanks for replying, but those aren't actually genuine replies, just evasions. I don't think you're being open or honest with us.

If being open and honest is defined as agreeing with you, then one would agree. If not, then not. YMMV.

PaulS said...

Evan,

This problem has pairwise written all over it.

That it is a problem is not at all clear.

What I think you're suggesting is that a sequence of small breakpoint adjustments is being applied to these stations, resulting in a gradual change. Firstly, this should be easily demonstrable just by plotting the difference between final and raw time series. Secondly, this seems unlikely given that Figure 6 in Menne et al. 2009 indicates very few detected inhomogeneities smaller than 0.3ºC. Actually, there seem to just about none from undocumented detections.

Evan Jones said...

Statistical homogenization using neighbouring stations does not work they way you think it does. Homogenization in the USA increase the trend by 0.4°C per century.

Sounds about right. That's what adjusting the well sited stations up to the level of the poorly sited stations does.

If homogenization would work the way you think it does, you should be able to find more than 50% of stations were the trend is the same before homogenization and a large minority where the trend even shows cooling.

That's what I would expect and that's what we see:
-- High end tapped down. (That "large minority" you mention.)
-- Upper-tier majority adjusted only a little some up, some down.
-- Low end hammered up. Way up.

And all that sure as heck reduces them there external error bars, don't it? But don't let that reassure you as to method.

Thing is that there is an even larger minority of stations on the low end. And those are the best stations.

In homogland, every datapoint is an outlier, it's just that some datapoints are more equal than others.

And if your upper-tier majority comprises the Good Citizens, then all is well and good in homogland. Kindly Uncle H has extended his hand in favor of the commoners.

But if it turns out that What you thought to be the Good Citizens turn out to be the Bad Citizens, what then? Well, not to put too fine a point on it, that is when Kindly Uncle H becomes the H-Bomb. You are not only applying you adjustment to the wrong set of stations, you are applying it in the wrong direction, as well.

You know all this problem in general terms already, of course. You even warn us about it. Says so right on the label in fine print, between the disclaimer and the skull-and-crossbones.

All I want is for things to be well and good in homogland. As well as they can be, anyway. I do this by identifying who is a Good Citizen and who is a Bad citizen. I am but your humble (unsolicited) kommissar. I have made my report. You are the scientist that can fix all this.

Evan Jones said...

Evan, maybe I've merely gotten lost in the tangle of the posts, comments, and responses, but I believe you have switched back and forth repeatedly from saying:

1) You have no hypothesis of a physical mechanism that would cause both in increase in warming trend and an increase in cooling trend.

2) Large heat sinks are the physical mechanism.

What is your explanation for (2)?


We do have a hypothesis. It is the same reason that Tmax comes hours after noon and Tmin comes hours after midnight. It's the ground that does that. Identified heat sinks don't do anything different. They just do it more and better. They add to an effect which is already there.

I and other people think:

1) A heat sink would in the very very short term decrease a warming trend by sucking the heat out of the air surrounding the station, and decrease a cooling trend by replacing the heat into the air. In other words, it would decrease the variability of the temperature changes.

Overall? Possibly. At 10AM, the heat sink temp (the including ground) will still be lagging the warming. But with min-max, that effect is never recorded. Measurements are recorded only for Tmax and Tmin. And that is when the effect manifests itself.


2) Even that stabilizing effect would operate over a time scale much shorter than multiple years, because the heat sink would become "saturated" so to speak to the new multi-year temperature mean.

Eventually, certainly. But if it did within the scale we are measuring it, we would not see the results we are seeing: Either the Tmax/Tmin lags from 12PM/AM from the ground around, or the amplifying effect of adding a nice covering of concrete to part of said ground around.

Victor Venema said...

Evan Jones, feel free to see my questions as nibbling around the edges, but that mainly shows that you did not internalise the scientific value of accurate reporting on your work yet.

It is different whether there were no time of observation changes or only none between begin and end (because the middle is also important) between am and pm (also changes in time within the morning or afternoon cause temperature jumps). It makes a difference whether no perturbation actually means no change in category (because also within the categories there will be huge differences in the temperature bias, especially given that you only implemented a part of Leroy's classification and these changes could have changed the class had you used the full classification). It is nice hear you are confident this does not matter; it would make your work stronger if you actually demonstrate this by making a sensitivity analysis and showing that the effect is small, which may well be.

Maybe it is not fully fair, but accurate reporting builds trust that the work itself was executed accurately. Just like accurate use of italics in comments to indicate quotations. Writing is a service to the reader.

Especially when you write a press release on a purely statistical results and others have to guess about the physics, they need to know the details and the words need to mean what the words normally mean.

Victor Venema said...

Evan Jones: "BTW, we did it the same way for Fall (2011), but we heard nary a peep from either the review panel or from any of the critics. But then y'all liked those results. Now that we are a near trend-match with both satellite trends, you-all don't like the results, and I am hearing a different tune."

Could you explain which of the conclusions of Fall et al. need to be updated? The main conclusion seems to stay the same: after homogenization the differences between the siting classes are very small.

That there are other requirements for fully new conclusions seems normal to me. Wouldn't you agree?

Victor Venema said...

Evan Jones: "Well, Anthony does what he does and I do what I do. In 2012, we were after larger-scale problems. Now we are just finalizing it for submission. And we released far more actual results this time than were included in the 2012 paper."

The technical quality of the study may have improved, I am not sure whether it is really near submission.

Evan Jones: "But as for the rest, I think you have it backwards: The delta creating the trend is small and gradual, so it will only show up on a decadal scale. Preferably a multidecadal one. The reason for this is the same reason why 30-year trends are a general rule of thumb for climate time series."

However, your "trend exaggeration" only shows up on your two selected time periods. You do not see it on shorter time periods (days, seasons, trends over several years). You do not see it on the 30-year time scale; there is a trend on the 30-year time scale, but the difference is no longer growing since 1996.

Evan Jones said...

What I think you're suggesting is that a sequence of small breakpoint adjustments is being applied to these stations, resulting in a gradual change.

Hmm. Well put. Challenging.

Answer: Sort of, but in an almost unconscious everflowing sense. The attention (like mine) is less on the mechanism and more on the result (which is a good method if the input is reasonably complete). The data departs from the previous result delta a little more, it gets adjusted back a little more. If the jump is small, the adjustment is small. If not, then not. There is no fixed size. Just happens. Very generic. How do I put it? Math like Tahiti lives, not like Germany lives?

And I agree it should be done that way. Microsite could be accounted for in a similar way. I doubt GHCN metadata is up to the task, though. Microsite history would have to be an inference, possibly not even station-by-station, but on a regional or even universal inference. With USHCN, we might do it. GHCN, not so much.


Firstly, this should be easily demonstrable just by plotting the difference between final and raw time series.

Yes. Pretty much. From what I can tell, the divergence becomes wider with the introduction of MMTS (even with upward adjustment for conversion applied). Then you see a departure. The first part is the CRS issue we adduce. The second part is what microsite adduces.

Note also, there is an larger warming departure from 1979-1998 (as might be expected) and we see poorly sited stations cool faster from 1999 - 2008, an unfortunately short interval, but the only one around to play with on this side of 1950 (for which we have no ratings, no metadata) -- thanks to AGW, which has made all the good cooling periods pretty flat.

Secondly, this seems unlikely given that Figure 6 in Menne et al. 2009 indicates very few detected inhomogeneities smaller than 0.3ºC. Actually, there seem to just about none from undocumented detections.

If I understand you correctly, that is because neither we nor Menne (2009) had the benefit of Leroy (2010). Also, we weren't accounting for moves (past 7 years), or TOBS either at that time. They used an incomplete, unchecked dataset. Well, will we have a better one for them. Probable end result? The mention of my name will be dropped from the NOAA adjustment page as Fall et al. takes a flit, and it will somehow not be replaced by this paper. But such is life.

Victor Venema said...

Evan Jones: "If you were right about the triviality of the heat sink effect, then Tmax would occur right around noon and Tmin would occur right around midnight. It is the amount and delta of that lag over time that is producing the divergence."

What do you mean with "triviality"? Naturally there is a thermal inertia. This, however, dampens temperature changes (it does not amplify them contrary to your claims) and it causes a time lag of the temperature maximum (as you rightly claim, but of which I fail to see the relevance for your argument).

Victor Venema said...

Evan Jones: "Even if you doubled our MMTS correction, there would be no material difference in our results. It would affect the Class 3\4\5s more than it would affect the Class 1\2s, anyway."

It would make the trend stronger. That is important.

A possible reason for the difference in trend between Class 3\4\5s and Class 1\2s is explained above with the example:

Someone has a weather station in a parking lot. Noticing their error, they move the station to a field, creating a great big cooling-bias inhomogeneity. Watts comes along, and seeing the station correctly set up says: this station is sited correctly, and therefore the raw data will provide a reliable trend estimate.

In all the comments I have seen from you, you did not seem to have replied to this key argument.

Victor Venema said...

Evan Jones: "homog operates in a manner that does not take an overall average, but adjusts small outlying clumps to conform with majority clumps."

Evan Jones: "You can theorize what the effect would be all you like. But the results are there and they are quite stark. Bottom line, the well sited station trends are being adjusted upward to match those of the poorly sited."

No, it does not. After all these years I am sure you know better than that.

The basic idea of statistical homogenization is explained here. Again.

Evan Jones: "If they largely contradicted the metadata, that would be of note. But they largely confirm it. So I think the gold speck here is that recent USHCN metadata is pretty darn good for our purposes. (There is no "perfect" here, in any of what any of us on either side of the discussion are doing.)"

There is a reason you only studied the last 30 years: the quality of the metadata decreases going back in time.

It the interviews provided a sufficient percentage of metadata, you could use this information by making two sets of metadata: with and without the interviews. That allows you to study the sensitivity of your results to the assumption about the quality of the metadata under the assumption that the interviews make the metadata more accurate.

Victor Venema said...

Evan Jones: "What does not add up here is that Menne appears to have noticed that the main issue of CRS-MMTS is not an offset issue, it is a trend divergence over time."

As we have seen in the parallel measurements of a CRS and an MMTS, this transition IS an offset issue. Also the change in quality of siting (or any relocation actually) is an offset issue. It changes the temperature bias.

That there is a divergence is your personal statement, repeated often, but not backed by evidence.

Evan Jones: "So he uses a pairwise over 15 years to bring the MMTS in line with the MMTS when he should be doing the exact opposite. Even his offset is in question, assuming the calibrators did their job."

If I understand Menne et al. correctly, they use the pairwise homogenization method. This means that the correction is computed over the homogeneous subperiods before and after the break. Not for a fixed 15-year period.

The offset is not just calibration, it is also overheating of the Cotton Region Shelter in the sun when there is not much wind and radiation getting onto the sensor.

Evan Jones: "I think he is using both to horsewhip the MMTS trends in line with the those of the CRS. He should be doing the opposite. We should not be adjusting the MMTS trends up. We should be adjusting the CRS trends down."

You forgot to mention a reason why. If the screen does not change from a CRS to an MMTS, but stays an CRS over the entire period, that removes one known cooling bias in the US temperature record. That makes the trends of stations that are still a CRS more reliable.

Victor Venema said...

Evan Jones: "We do Menne's. It's not completely unalike, except that he gives it at -0.10 at Tmax and +0.025 at Tmin., which brings it in line with CRS (which also raises the question of how the initial calibration is so much adjusted in the first place)."

Those numbers do not fit to the numbers mentioned in your reference.

Evan Jones: "There are two dogs here: MMTS and CRS. One of them is mad. And Dr. Menne has gone and shot the wrong one."

Pretty amazing, with a homogenization algorithm that does not know which screen is used at the end of the series and thus does not have any preference. The algorithm just looks at how one station behaves relative to its neighbours.

Tom Dayton said...

Evan, maybe the confusion is becasue your use of the term "heat sink" is inappropriate for the physical mechanism you are proposing. Here is what I think a slab of concrete would do if it got its energy solely from the air:

1. Half hour after dawn. Concrete and air are in equilibrium with each other.

2. Air starts to warm. Concrete also starts to warm from the air, but lags the air's warming both because air warms first and because concrete has more thermal mass. Air's temperature rise is slowed, because some of its energy is going into the concrete.

3. Air reaches its max temperature from sources other than the concrete. This is TMax. Concrete still lags.

4. Air stays at its TMax temperature from non-concrete sources long enough for concrete to catch up to equilibrium with the air. Concrete is not warming the air as the concrete catches up.

5. Concrete catches up to the air. Air and concrete at equilibrium with each other at TMax(air) and TMax(concrete). Neither is causing the other to warm.

6. Both air and concrete temperatures do not change. The concrete cannot increase the temperature of the air, by definition of equilibrium.

7. Sun is setting, so air starts to cool. Concrete no longer in equilibrium with air, so concrete bleeds heat into the air, reducing the air's cooling. Nonetheless, the air does not rise above TMax(air), because it is getting heat from the concrete only while the air is below TMax(air).

That (I believe) is how a "heat sink" would affect the air temperature. It is impossible for the heat sink to increase the TMax the air would have in the absence of the heat sink.

Maybe what you are grasping for as a mechanism is not a "heat sink" per se but an extra energy source such as concrete that is being warmed by the Sun's rays so the concrete's temperature rises above its air-equilibrium temperature, and the concrete dumps that energy into the air after the air reaches its max temperature from other heat sources.

But that "extra energy source" mechanism cannot cannot amplify a cooling trend even within one day. And it cannot amplify a warming trend beyond one day. So still you are left with no physical mechanism.

Steve Bloom said...

VV: "Maybe it is not fully fair, but accurate reporting builds trust that the work itself was executed accurately."

But it should be very clear to anyone reading the preceding exchanges that the project's main objective is simply to keep the ball in the air for as long as possible, thus the planned interminable string of follow-up studies. The only real question I have at this point is why in the world n-g is willing to continue being associated with this business. OTOH maybe it's just that it's helpful for the TX State Climatologist to have some denier camouflage.

A few other questions occurred to me just now, Victor, and excuse me if they've already been addressed somehow since it's been a few years since I've made any attempt to follow this issue carefully:

Assuming for the sake of argument that heat sink effect is as claimed, rather than the admittedly onerous approach of trying to measure that effect directly, what about examining the records for the difference between cloudy and sunny days at different times of year? Similarly, what about the effect of snow-covered ground? And windy vs. still days (accounting for direction from the heat sink)?

Although, is it really onerous? What about sticking some IR sensors on some stations? IIRC CRN has those, presumably for a similar purpose, recalling also each CRN station has a paired USHCN station that could be looked at.

Finally, re EJ's claim that he thinks the (land-only, presumably) surface temp trends are more like the satellite data than is currently thought, what's the physics say? E.g. should mixing of air masses in the lower troposphere be expected to even out the difference between over land and ocean surface temps?

Evan Jones said...

It would be interesting to do a similar study for the MMTS corrections and see if the pairwise homogenization algorithm would be able to correct them just as well.

Funny you should say that. Dr. Menne said the exact same thing -- and i think he had a gleam in his eye when he said it.

Tom Dayton. Must crash. But I will review and reply. Also to other stuff.

PaulS said...

Victor,

Those numbers do not fit to the numbers mentioned in your reference.

The -0.10 and +0.025 numbers come from text in Menne et al. 2010:

The lack of very small magnitude shifts in Figure 5 is a consequence of adjusting only those shifts that were statistically
significant according to the pairwise comparison procedure. However, the average of all unadjusted MMTS transitions is about −0.1°C for maximum temperature series and about +0.025°C for minimum temperature series.


In context I think this means those numbers represent the average change identified for documented LiG-MMTS transitions, including those which did not pass statistical significance. Menne et al. 2009 notes that only 40% of conversions produced statistically significant offsets.

Kevin O'Neill said...

Evan you've repeatedly cited the numbers from Menne and PaulS provides the exact text: "However, the average of all unadjusted MMTS transitions is about −0.1°C for maximum temperature series and about +0.025°C for minimum temperature series."

But you ignore Menne when he says: "As a result, the overall effect of the MMTS instrument change at all affected sites is substantially less than both the Quayle et al. (1991) and Hubbard and Lin (2006) estimates. However, the average effect of the statistically significant changes (−0.52°C for maximum temperatures and +0.37°C for minimum temperatures) is close to Hubbard and Lin’s (2006) results for sites with no coincident station move."

The first set of numbers (the ones you're using) are based on *all* unadjusted MMTS. The second set of numbers (much larger deltas) are specifically from those that did not have a coincident station move. Now, isn't your unperturbed subset of stations better represented by the 2nd set of numbers?

In addition, by applying an offset you've 'smeared' the temporal and spatial resolution. The MMTS system has a temperature dependent systematic bias. That's just a physical fact of the thermistor Dale Vishay used. Applying the same offset regardless of the local climate norm or date of conversion means your regional and monthly data are skewed. Depending on the individual dates of conversion you might not even have decent decadal resolution.

Lastly, you have never mentioned comparison with USCRN. USHCN is actually running cooler than USCRN (at the moment). If your data also disagrees with USCRN, why? How does your heatsink hypothesis apply to comparisons with USCRN?

Also, you must have looked at your comparison using data through 2014. Why have you never mentioned the results using the most recent year's data?

MartinM said...

PaulS: In context I think this means those numbers represent the average change identified for documented LiG-MMTS transitions, including those which did not pass statistical significance.

I don't think so. If only those shifts which were statistically significant were adjusted, then the unadjusted transitions of the next sentence are surely the non-significant ones only, not the whole lot. That interpretation fits with Figure 5; the quoted values of -0.1 °C and +0.025 °C fit nicely in the empty areas where the non-significant shifts live.

Furthermore, Menne et al. 2010 goes on to say "The adjustments for the impact of the MMTS on maximum temperature series in the USHCN version 2 data set are therefore somewhat inadequate...the 'under‐adjustment' in maximum temperatures is a consequence of using site-specific adjustments for the MMTS in the version 2 release as opposed to a network‐wide, fixed adjustment as in version 1"

I think what they're saying here is that the pairwise algorithm fails to correct for MMTS biases which exist but are too small to be detected with significance. On the other hand, if the quoted figures are the average of all adjustments, I have no idea what the above sentence is supposed to mean. Finally, the figures in Menne et al. 2009 - 40% of shifts significant, with mean significant adjustments of -0.52°C and +0.37°C - are hard to reconcile with those of the 2010 paper if the quoted figures are for all adjustments, but work just fine if they're for non-significant only.

In short, it seems to me that Watts and co have underestimated their MMTS corrections by...well, I think the technical term is 'lots'.

Kevin O'Neill said...

MartinM - Yes :)

The Quayle adjustment was used in version 1; that was an offset of 0.4C. Hubbard & Lin did their side-by-side field study and came up with a temperature dependent adjustment that had two parts; MMTS Bias I and MMTS Bias II. If we ignore (possibly at our own peril) Bias II for the moment, the Bias I adjustment would be 0.32C at -10C and 0.57 at 25C. The average for all stations is probably a few hundredths higher than Quayle's 0.4C offset.

So what Watts et al have done is not only understated the adjustment, they've degraded ("smeared") the daily, monthly, and seasonal resolution by ignoring the temperature dependence. Likewise by ignoring the temperature dependence they've degraded the spatial resolution - since there is a wide variation in local climate normals by location.

Menne believed that homogenization was a better and likely more easily calculable result. Watts et al apparently believe they don't have to do either. They reduce the Quayle offset and the Menne statistically significant changes by 75% - then wonder why they have a result that's different (and unphysical).

Evan Jones said...

But you ignore Menne when he says: "As a result, the overall effect of the MMTS instrument change at all affected sites is substantially less than both the Quayle et al. (1991) and Hubbard and Lin (2006) estimates. However, the average effect of the statistically significant changes (−0.52°C for maximum temperatures and +0.37°C for minimum temperatures) is close to Hubbard and Lin’s (2006) results for sites with no coincident station move."

What I do is use the figures he actually gives. Jumps only.

I will be providing the work I did in basic Excel. You can therefore easily see how I did it and substitute any other method you choose. This is not an end product. It is intended as a tool to be revised and expanded so anyone can use any dataset they like (e.g., USHCN TOBs-adjusted), bin any data they choose and adjust it in anyway they like. A tool for analyzing the USHCN.

I am also saying that everyone (including myself) is adjusting the wrong thing. We are adjusting MMTS upward. Even Zeke points out that most scientists think MMTS is a better measurement than LiG. We should be adjusting CRS trend (sic) lower, not jumping MMTS offset (sic) higher. (One arguably might make a case for both. But calibration should have handled that.) But for Menne, this is just another one of those Let Homogenization Handle It moments. Which it did.

And, come to think of it, that would be another systematic error in the data causing another homogenization-bomb. It is on a smaller scale than microsite, but it could easily make a 0.02C/d difference, even more if using other metrics. That's low, but not nothing. Both the CRS issue and the microsite issue involve the wrong set of stations being adjusted in the wrong direction for the wrong reason.

The result of that would be a "warming of the past", and the 20th century warming graph would look a more like Haddy2 than Haddy3\4. IIRC, the former wasn't homogenized (not that it was without pairwise).

Menne believed that homogenization was a better and likely more easily calculable result.

You said it, I didn't. Push button, get results you like. Well, okay, we both said it. (But you weren't the one rolling his eyes at the time.) I think he is pushing the wrong buttons.

Watts et al apparently believe they don't have to do either. They reduce the Quayle offset and the Menne statistically significant changes by 75% - then wonder why they have a result that's different (and unphysical).

Let us be very generous and say the Menne jumps are legit. Even the full MMTS-adjusted UHCN NOAA data shows only ~0.02C/d -- after pairwise. So if MMTS is believed to be a better thermometer than a CRS, the only "unphysical" thing about this is that we are adjusting MMTS rather than CRS.

But addressing that would upset the applecart all the way back to 1880. The whole pre-MMTS shebang. In the wrong direction.

So what Watts et al have done is not only understated the adjustment, they've degraded ("smeared") the daily, monthly, and seasonal resolution by ignoring the temperature dependence. Likewise by ignoring the temperature dependence they've degraded the spatial resolution - since there is a wide variation in local climate normals by location.

Bottom line: We have applied an adjustment to the wrong set of stations for the wrong reason. The objection to that appears to be that it should have been a bigger and better adjustment. But it's not a big adjustment in any case.

Evan Jones said...


Someone has a weather station in a parking lot. Noticing their error, they move the station to a field, creating a great big cooling-bias inhomogeneity. Watts comes along, and seeing the station correctly set up says: this station is sited correctly, and therefore the raw data will provide a reliable trend estimate.

Well, that was 2012, this is now. Curators don't move the stations themselves,that is handled by NOAA.

So I check the metadata. If there is a localized move and we don't know where it was previously, we drop the station (an overwhelming majority of cases). If we do know the previous location, in your above example, we notice that the station rating will have changed and we drop the station.

If we can, we contact the curator, but that is no longer an easy task, as NOAA, upon learning of the project, promptly removed all the curator names and addresses from its metadata claiming we might harass the curators. I never heard one complaint, and the curators were always delighted to talk about their stations and to learn they were in charge of tracking an important historical data series (they hadn't previously been told they were HCN). I never, ever told them they had a poorly sited station or otherwise impugned their efforts. I thanked them for their patriotic civic-mindedness and listened with interest to their stories. I was on the phone with some for more than an hour at a time.

On the other hand, let's look at the results. back in 2012, when we did not account for the metadata, our results for both well and poorly sited station trends were lower, but the disparity between well and poorly sited stations remained. So the disparity did not magically appear when we dropped all those stations.

The argument back in 2012 was that since the rural stations were more likely to be subject to TOBS and well sited stations were more likely to be rural, accounting for the metadata would wipe out the disparity.

But, as non-urban stations siting averages worse than urban, that did not happen.

So the metadata issue, while a relevant and legitimate concern (one that cause us to drop over half our initial sample) did not remove the well/poorly sited station disparity. Furthermore, the metadata is surprisingly complete, so we are not relying on gaps to lend support to our hypothesis.

Evan Jones said...

Evan Jones: "Even if you doubled our MMTS correction, there would be no material difference in our results. It would affect the Class 3\4\5s more than it would affect the Class 1\2s, anyway."

It would make the trend stronger. That is important.


Yes. But but not on a scale comparable to the disparities we are finding relating to siting. Besides, the problems appear to be not with the MMTS, but with the CRS units. CRS is what needs to be adjusted. And that would make the trend weaker rather than stronger. That is important.

Evan Jones said...

Evan Jones: "You can theorize what the effect would be all you like. But the results are there and they are quite stark. Bottom line, the well sited station trends are being adjusted upward to match those of the poorly sited."

No, it does not. After all these years I am sure you know better than that.


It's the observation. We have the raw data for the well sited stations. We remove those with incongruent metadata. The well sited station trends (raw+MMTS) are much lower than the figure to which they are adjusted. Poorly sited stations receive very little adjustment. Bottom line.

The reason for this appears to be that the large majority of stations are poorly adjusted (a systematic effect), thus throwing off the intended result of homogenization. If the situation were reversed, with well sited stations were in the majority, homogenization would work as intended. But they are not.

And as I indicated above, this did not appear out of nowhere when we accounted for the metadata. It was there, all along, whether accounting for metadata or not. So the disparity between the well and poorly sited station trends does not appear to be a result of unreliable metadata.

Evan Jones said...

What do you mean with "triviality"? Naturally there is a thermal inertia. This, however, dampens temperature changes (it does not amplify them contrary to your claims) and it causes a time lag of the temperature maximum (as you rightly claim, but of which I fail to see the relevance for your argument).

It is interesting. The opposing arguments have a common them. let us step back a bit. Each side thinks the effects are, in effect, opposite to what the other thinks.

1.) I think heat sink reduces residual heat by absorbing it away from the sensor. I think it adds a concentration and re-emits it towards the sensor.

2.) You think the quality of the metadata is a potentially large skew in our poorly/well sited station disparity. I think it is a potentially small skew, and this is shown even when all metadata considerations are ignored.

3.) You think MMTS trend should be adjusted up (by whatever means) and that CRS should not be adjusted. I think MMTS should not be adjusted (other than incidentally) and that CRS trend must be adjusted down.

4.) You think that the adjusted data causing all classes to be the same confirms the success of homogenization. I think it highlights a systematic error in the current application of homogenization (something that could be fixed fairly easily).

Those are very interesting points of disagreement.

Li D said...

First thought. What is this? A high school project. Show it to people before giving it to
the teacher? Blokes are supposed to BE the
experts in this niche. Thats why they are writing the paper thingie. Asking for help
off the internet. Jeeez..
Second thought. If Watts and co want help,
a bloody good start might be articulating a
H0.
Then people know what the deal is.

Kevin O'Neill said...

USHCN tracks USCRN.
USCRN is more accurate and has no siting problems.

Evan's answer to this? Crickets chirping.

Victor Venema said...

Steve Bloom: "Assuming for the sake of argument that heat sink effect is as claimed, rather than the admittedly onerous approach of trying to measure that effect directly, what about examining the records for the difference between cloudy and sunny days at different times of year? Similarly, what about the effect of snow-covered ground? And windy vs. still days (accounting for direction from the heat sink)?"

The most direct way estimate where the heat sinks are largest would be to study where the daily cycle or the annual cycle is smallest. (Except for a direct measurement, which we do not have in this case.) This can also be influenced by the instrumentation. The examples you mention are likely better suited to study how large the radiation error of thermometer is than to study the heat sink itself.

The annual cycle estimate is likely the most accurate one and a heat sink that is able to change a trend over a decade should have a strong influence on the annual cycle (which is not seen).

Tom Dayton said...

Evan, you wrote "1.) I think heat sink reduces residual heat by absorbing it away from the sensor. I think it adds a concentration and re-emits it towards the sensor."

What do you mean by "residual" heat? What do you mean by "a concentration" of heat? I hope you merely have poorly described what I explained step by step how a heat sink works, in my earlier comment.

I'm afraid that instead you mean the new Star Wars VII movie's planet destroyer that sucks energy from the Sun and then blasts it all out when triggered.

Victor Venema said...

Evan, Katy needs you: Eeeevvaaaaaannnnnn, where aaaaaare you?

PaulS: " The -0.10 and +0.025 numbers come from text in Menne et al. 2010 "

Ah, thanks. 2009 was mentioned first, thus I had assumed the numbers came from there and 2010 provided the same numbers or additional information.


Evan Jones: "Even if you doubled our MMTS correction, there would be no material difference in our results. It would affect the Class 3\4\5s more than it would affect the Class 1\2s, anyway."

May I suspect that you draw this conclusion based on the percentage of breaks due to the MMTS transition in the classes 3\4\5s and 1\2s? Because you do not seem to have studied the size of the temperature change due to the transition. That would have required a comparison with a neighbouring station, which is the work of the devil for your peers.

However, in this case the important question would be whether the size of the MMTS correction depends on these siting classes.

As Kevin quotes Menne et al.: "As a result, the overall effect of the MMTS instrument change at all affected sites is substantially less than both the Quayle et al. (1991) and Hubbard and Lin (2006) estimates. However, the average effect of the statistically significant changes (−0.52°C for maximum temperatures and +0.37°C for minimum temperatures) is close to Hubbard and Lin’s (2006) results for sites with no coincident station move."

The cooling effect from the change in the instrument alone studied by Hubbard and Lin is larger than the cooling effect observed from the cooling change in the instrument and the warming change in siting. It seems likely that stations with are currently badly sited have experienced a larger warming change due to the siting and need a larger adjustment.

Relative homogenization would do so. The newly introduced fixed adjustments of Watts et al. (2015) do not. The same would go for the temperature dependence of the difference between a CRS and MMTS mentioned by Kevin.

Kevin O'Neill said...

Evan writes: "We are adjusting MMTS upward. Even Zeke points out that most scientists think MMTS is a better measurement than LiG. We should be adjusting CRS trend (sic) lower, not jumping MMTS offset (sic) higher."

Then how do you explain Hubbard & Lin and their co-located USCRN sensors and MMTS? The USCRN sensors are inherently more accurate than either LIGs or the Dale/Vishay thermistors. They showed that the MMTS raw data has to be adjusted upwards to match USCRN co-located readings.

Are you now claiming that MMTS is more accurate than USCRN also?

This is probably the 7th time in various threads that I've asked you to explain your results in light of USCRN. You haven't, yet. Perhaps you can't. I don't know because you simply don't/won't address it.

BTW - do you have the dates of conversion for all the MMTS? Is that in the metadata?

Brandon R. Gates said...

Evan Jones,

Thank you for your response:

What does not add up here is that Menne appears to have noticed that the main issue of CRS-MMTS is not an offset issue, it is a trend divergence over time. So he uses a pairwise over 15 years to bring the MMTS in line with the MMTS when he should be doing the exact opposite.

Let me restate to make sure I correctly understand your argument. I think you mean that Menne brings MMTS in line with CRS using a pairwise trend adjustment, which introduces a warm-trending adjustment MMTS when in fact he should be applying a cool-trending adjustment to CRS and leaving MMTS alone. Going back to your statement over at Dr. Curry's ...

We do this by applying the Menne (2009) offset jump to MMTS stations at the point of conversion (0.10c to Tmax, -0.025 to Tmin, and the average of the two to Tmean). We do not use pairwise thereafter: We like to let thermometers do their own thing, inasmuch as is consistent with accuracy.

... you are saying that the adjustment should be a breakpoint adjustment to CRS in the form of a one-time offset, the net result of which is cooling. Which result is obviously the opposite of the warming adjustment proposed by Menne. [1]

One place (there are others) I know I'm still stuck is here: "We do not use pairwise thereafter ..." because it is not clear to me that you're using pairwise comparison at all, except perhaps implicitly by way of the 0.10c to Tmax, -0.025 to Tmin offset.

I apologize if you have clarified elsewhere and I have missed it.

------------

[1] As PaulS and now others note, the correct reference is Menne (2010), not (2009).

Victor Venema said...

MartinM: "I think what they're saying here is that the pairwise algorithm fails to correct for MMTS biases which exist but are too small to be detected with significance."

Kevin O'Neill: "MartinM - Yes :)"

I would read it the same way. That these smaller numbers used by Watts et al. (2015) are for the breaks that could not be detected. These numbers are thus too small.

Menne et al. (2010), Bulletin of the American Meteorological Society (free):
The lack of very small magnitude shifts in Figure 5 is a consequence of adjusting only those shifts that were statistically significant according to the pairwise comparison procedure. However, the average of all unadjusted MMTS transitions is about −0.1°C for maximum temperature series and about +0.025°C for minimum temperature series. The adjustments for the impact of the MMTS on maximum temperature series in the USHCN version 2 data set are therefore somewhat inadequate, as reflected in Figures 2g and 3g. In fact, contrary to there being a positive (warm) bias as might be suggested by the exposure conditions at MMTS sites, there appears to be a residual, artificial negative bias in adjusted maximum temperatures (and little to no residual bias in adjusted minimum temperatures). In short, the “underadjustment” in maximum temperatures is a consequence of using site‐specific adjustments for the MMTSin the version 2 release as opposed to a network‐wide, fixed adjustment as in version 1 [Quayle et al., 1991]. ...

Furthermore, if you want to study the influence of siting, then you should only correct for the change in instrumentation using parallel measurements such as Lin and Hubbard. The additional change in the quality of siting (which you include if you use the values of Menne et al. (2009) from relative homogenization) is not supposed to have happened in the dataset of Watts et al. (2015). They claim there were no changes in their "unperturbed" data.

Kevin O'Neill said...

Victor: "Furthermore, if you want to study the influence of siting, then you should only correct for the change in instrumentation using parallel measurements such as Lin and Hubbard."

Yes :)

Watts et al have the perfect opportunity to enhance the data for those stations they've isolated as 'unperturbed.' All they need to do is find the date for conversion for MMTS and apply the Hubbard & Lin Bias adjustments. There should be no confounding factors to interfere with adjusting for the sensor change.

Given that the day-to-day variation for a single station is multiple degrees C, that the diurnal variation is many degrees C, and that the conversion adjustment is on average 0.4C - trying to simply look at a temperature record and identify the station's date of conversion to MMTS is basically impossible. One needs to know the date of conversion or use pairwise to try and find it.

For poorly sited stations the MMTS conversion should still be done, but I haven't figured out if this will actually affect the overall statistics or not. It could be that the other errors are so large that the MMTS correction will get 'lost' in the larger uncertainties. Conversely it may simply increase the warming starting at the date of conversion. It's clear that MMTS Bias 1 increases the temperature, but BIAS II might negate that somewhat.

I stated sometime ago that this was likely to be an "own goal" situation. Watts et al are well on their way to proving (or at least drawing attention to the fact) that the PHA algorithm underestimates the MMTS adjustment.



Evan Jones said...

May I suspect that you draw this conclusion based on the percentage of breaks due to the MMTS transition in the classes 3\4\5s and 1\2s? Because you do not seem to have studied the size of the temperature change due to the transition. That would have required a comparison with a neighbouring station, which is the work of the devil for your peers.

Augh! Damned am I. Led astray by the Red Baron, himself! Curse you and recurse you! I confess, though it is too late for divine pity. Sinners everywhere, take heed of my example lest ye be forever lost.


Seriously, though, I made an incomplete pairwise comparison with like-rated stations at the point of MMTS conversion. The problem, really, is that there are not enough Class 1\2s, and the regional trends differ, so pairwise is more like distant-cousinwise.

At any rate, by the time Excel bombed out on me and refused to handle it, I had results within ~0.01C/d of what I am using now. So I am not moved by claims of huge swing.

The problem is, really, how to handle it. If one considers CRS to be more reliable, then one would adjust the MMTS. If once considers MMTS more reliable, one would adjust the CRS.

As I said earlier, it is possible that both an offset and a trend adjustment apply. I think it is obvious that trendwise, it is the CRS that need the adjusting. But I do not discount the possibility that MMTS might deserve an upward offset bump.

What I do is leave the trends alone and add the offset bump to MMTS, thus raising the trend of the MMTS. Menne goes further and uses pairwise years in either direction to further increase MMTS trend. I think that trendwise, MMTS should not be adjusted (unless extraneous reasons appear). I think CRS is what needs the trend adjusting, regardless of whether an offset is added to MMTS.

Victor Venema said...

The PHA corrections are likely an underestimate. Homogenization methods improve trends, but do not fully remove biases.

Whether the Hubbard and Lin corrections are the right ones should be studied, however. That will also depend on the local climate and on the conditions of the CRS. We would need more parallel measurements spread over the USA to be sure of that. A nice project for a volunteer force ala surface stations. A few years of measurements already provides a wealth of information.

http://www.surfacetemperatures.org/databank/parallel_measurements

Evan Jones said...

To be clear:

Pairwise is often necessary. I know this.

I want to follow up this paper with an inclusion of the stations with only partial data. For example, if a station moved in 1995, I want to include the known part of the data.

But problems arise. First, I have to baseline the new startpoint, because if I don't, then the previous trend effects apply and that is exactly what I am trying to avoid. That requires pairwise. There is also the issue of record length, and arguably, trends are better determined if missing data is infilled (anomalously), so the entire length of the series is accounted for before converting to anomaly.

After that, you can drop all the infill and use what you have. But to get the right figures in the available slots, one needs the infill.

Pairwising has more than one approach. One can do "5-nearest" or some equivalent. One can do "within the radius of x". One can do it "regionally" (i.e., with other stations within a region, regardless of distance).

Obviously, pairwising must be done 1\2s to 1\2s and 3\4\5s to 3\4\5s. The biggest problem is that there are fewer Class 1\2s (plenty of datapoints for the others).

--The X-Nearest method gets enough samples, but the distances of some of those compared don't pass the laugh test.
--The Radius-X method avoids the problems of X-Nearest, but may not have enough samples.
--The Regional method has both of the above problems, but to a lesser degree. It also cuts off a station from another close by, but in a different region. It has one large advantage, though: regions are selected for their unique terrain, and, indeed, regions warm at differing rates. So it makes sense to pairwise a WNC region with stations in the prairie WNC, and not the desert SW, even if a SW station is closer, as there is a strong trend difference between the two regions.

One might try a fusion method and do either of the first two options (radius or nearest), but also normalize for region.

All of the above methods have strengths, all methods have weaknesses. The greater the station density (and better the distribution), the lesser the problems.

That's my wargame-designer/developer hat's take. (What did I leave out?)

Also, J N-G has no problems with pairwise. He even ran an experimental infill on a previous incarnation of our station list (results are not much affected).

We will have to pairwise in order to do what we will need to do in followup of this paper. A larger number of samples, even if partial, will go a long way. I will also be looking closely at how this affects CRS, as the gridded average is higher than the non-gridded for Class 1\2s. If I am going to boldly apply corrections to CRS, I don't want to get it too high.

I would even argue that, in a phlosophical sense, we adjust, de facto, by dropping, and we "krige", de facto, by regionalized gridding. So even in the current paper, there is no escape. But at least we use a metric that requires only a minimum of adjustment to data.

Tom Dayton said...

Evan, here is your opportunity to be a real scientist, by following any of the suggestions folks are giving for you to use the data you have plus gathering some more.

But to do that, you must abandon your life's mission to show global warming is not happening / is small, along with your fixation on a model of physics (heat sinks, heat sources,...) that would flunk you out of a high school physics course.

Imagine how much more rewarding it would be to be part of the real scientific community! I'm totally serious about this. But it requires you drastically adjust your attitude, and probably piss off the fake skeptic crew.

Evan Jones said...

A nice project for a volunteer force ala surface stations. A few years of measurements already provides a wealth of information.

One could always just do a pairwise at point of conversion -- with other MMTS units. That, and that alone will show the true jump. Apples to apples.

Then examine the orangey CRS trends ...

I think I like it. Sign me up.

Evan Jones said...

I would read it the same way. That these smaller numbers used by Watts et al. (2015) are for the breaks that could not be detected. These numbers are thus too small.

IRRC, Menne (2009) indicates that only ~40% of the conversions showed a significant jump. But we apply the indicated offset jump to 100% of conversions. So if we are under, it's not because of that -- it's because we didn't mess with trend (via a 15-year pairwise) the way Menne does.

Evan Jones said...

They showed that the MMTS raw data has to be adjusted upwards to match USCRN co-located readings.

Are you now claiming that MMTS is more accurate than USCRN also?

Not at all. Just that MMTS are more accurate than CRS. And that Menne uses that comparison. PRTs are round-the-clock, though, and that give a different mean than Max-min. Is this accounted for?

Besides, as I say, using my excel data sheets, you can simply replace the Menne offset with H&L or whatever you wish. The idea, here is not to establish finalities, but to encourage a work in progress and continuing discussion.

Victor Venema said...

Evan Jones: "IRRC, Menne (2009) indicates that only ~40% of the conversions showed a significant jump. But we apply the indicated offset jump to 100% of conversions. So if we are under, it's not because of that -- it's because we didn't mess with trend (via a 15-year pairwise) the way Menne does."

I hope you do "mess" with the trend, when you correct an inhomogeneity the trend changes. The only way not the change the trend would be to apply the correction before and after the break, but that is useless, then the non-climatic change is still there.

The size of the corrections you are now applying seems to be for those 40% not statistically significant breaks. As you say you need something for all breaks. The statistically significant breaks are larger.

Further problem these average corrections mentioned by Menne et al. are for the instrumental change and the location change. You claim not to have a change in siting. Thus these numbers are not the ones you need and most likely too small.


Everyone, please refer to papers as, e.g. Menne et al. (YYYY), or the second time Menne et al. That is how one refers to a paper. Menne is a person; a highly respected member of the homogenization community, who has contributed enormously. We cannot look into his head, we only know what is in the paper. This paper is not written by Menne alone, it was internally peer reviewed by NOAA, it was independently peer reviewed by the journal, it has been approvingly cited by many scientists. Thus a paper is part of science and not just one person. Furthermore, it is more pleasant not to attack persons, especially for those that like flowery language.

Tom Dayton said...

Evan, your fixations on your hypotheses and assumptions are seriously interfering with your ability to do the science. I'm not picking on you! This is a typical (universal?) problem that students have at least once--including me (I remember it well). Graduate school with strong mentoring does a good job of getting students past that problem, and after once or twice they learn to recognize by themselves, when they have gotten too attached. Actually, I'm oversimplifying. Science is craft, and mentoring plus experience are needed to learn a good balance between abandoning an idea too soon and abandoning it too late. Mentoring shortens the learning time. You are getting mentoring from many folks on this and other blogs. It would be a shame for you to waste that resource and in so doing waste this opportunity to do real and good science.

Evan Jones said...

Let me restate to make sure I correctly understand your argument. I think you mean that Menne brings MMTS in line with CRS using a pairwise trend adjustment, which introduces a warm-trending adjustment MMTS when in fact he should be applying a cool-trending adjustment to CRS and leaving MMTS alone. Going back to your statement over at Dr. Curry's ...

Yes. I think NOAA has gone in the incorrect direction on this. But we do not adjust CRS in our current paper.

It is, of course possible that MMTS needs adjustment as well. But CRS Tmax is what is popping out, here. Even CRS Tmin is well up over MMTS for the unperturbed Class 1\2 set.

... you are saying that the adjustment should be a breakpoint adjustment to CRS in the form of a one-time offset, the net result of which is cooling. Which result is obviously the opposite of the warming adjustment proposed by Menne. [1]

No. It is possible that conversion might result in an upward offset, but one should compare the jumps with other MMTS stations. Apples to apples.

Anyway, if you apply a downward offset to CRS, that will have the exact same effect on trend as applying an upward offset to MMTS.

Actually, I am saying something a lot more radical: I am saying that, offset adjustment aside, CRS needs a trend adjustment. A downward one. As in "warming the past" all the way back to 1880. CRS is trend-biased by the box itself, which is a heat sink.

One place (there are others) I know I'm still stuck is here: "We do not use pairwise thereafter ..." because it is not clear to me that you're using pairwise comparison at all, except perhaps implicitly by way of the 0.10c to Tmax, -0.025 to Tmin offset.

Yes. Your inference is correct.

Evan Jones said...

Asking for help
off the internet. Jeeez.


Please. You do us an injustice. Try: handwaving, entreating, begging, pleading ...

(We got this strange notion that we can learn from other people, especially from those with whom we disagree.)

Evan Jones said...

Evan, your fixations on your hypotheses and assumptions are seriously interfering with your ability to do the science. I'm not picking on you! This is a typical (universal?) problem that students have at least once--including me (I remember it well).

Yes. But I am trying to get it right.


Graduate school with strong mentoring does a good job of getting students past that problem, and after once or twice they learn to recognize by themselves, when they have gotten too attached.

I know. Pete and repeat. (M.A., US History, Columbia University.)

Actually, I'm oversimplifying. Science is craft, and mentoring plus experience are needed to learn a good balance between abandoning an idea too soon and abandoning it too late.

With the data showing what it shows, I am not inclined to abandon it. If it turns out there is a systematic problem or one of confirmation-biased method, and the results are different, then I will be so inclined.

Mentoring shortens the learning time. You are getting mentoring from many folks on this and other blogs. It would be a shame for you to waste that resource and in so doing waste this opportunity to do real and good science.

I know full well how valuable it is. I'm not wasting any of it. I hope you all know how grateful I am.

Evan Jones said...

But it should be very clear to anyone reading the preceding exchanges that the project's main objective is simply to keep the ball in the air for as long as possible, thus the planned interminable string of follow-up studies.

Sheesh. It wouldn't be because we are getting valuable reactions to our findings, and have more materials and all sorts of questions to pursue more (unfunded) study. Not to mention being really, really interested in it. That would be way too simple.

The only real question I have at this point is why in the world n-g is willing to continue being associated with this business. OTOH maybe it's just that it's helpful for the TX State Climatologist to have some denier camouflage.

Why don't you ask him?

Assuming for the sake of argument that heat sink effect is as claimed, rather than the admittedly onerous approach of trying to measure that effect directly, what about examining the records for the difference between cloudy and sunny days at different times of year? Similarly, what about the effect of snow-covered ground? And windy vs. still days (accounting for direction from the heat sink)?

All interesting possibilities.

Although, is it really onerous? What about sticking some IR sensors on some stations? IIRC CRN has those, presumably for a similar purpose, recalling also each CRN station has a paired USHCN station that could be looked at.

Why not? But we would need about 20 more years of CRN data to get meaningful results. And if the trend remained relatively flat through the current PDO, one wouldn't learn much.

Finally, re EJ's claim that he thinks the (land-only, presumably) surface temp trends are more like the satellite data than is currently thought, what's the physics say? E.g. should mixing of air masses in the lower troposphere be expected to even out the difference between over land and ocean surface temps?

Over land, Klotzbach (also Christy) says that LT trends should be 10% to 40% higher than LST (depending on latitude). We track just slightly under UAH6 and a little more under RSS over the time period (CONUS).

Kevin O'Neill said...

Evan writes: "The problem is, really, how to handle it. If one considers CRS to be more reliable, then one would adjust the MMTS. If once considers MMTS more reliable, one would adjust the CRS."

No, this is a fundamental problem with understanding. If you want accuracy, then you want the USCRN PRTs. You can (and should) apply the Hubbard & Lin MMTS Bias I and MMTS Bias II adjustments to every MMTS conversion. It has *nothing* to do with CRS. CRS is not part of the equation. It is based on the physical response of the thermistor used in MMTS and differing responses to temperature, radiation, and wind.

Yes, the TOBS is taken into account with USCRN. If we agree the USCRN is most accurate, then why aren't you applying the correct adjustment to the MMTS to match the USCRN PRTs?


Your repeated statement that ^we* can do it is nonsensical. You're writing the paper. You should be doing it - unless you've already decided what result you want and don't want to veer away from it.

You've also *still* not addressed the issue that USHCN matches USCRN. Do your results? If you make a significant change to USHCN doesn't that mean you'll be forcing it to *disagree* with USCRN?

Do you know the dates of the MMTS conversions? The metadata I've looked at does not include it.

Evan Jones said...

USHCN tracks USCRN.
USCRN is more accurate and has no siting problems.

Evan's answer to this? Crickets chirping.


Asked and answered. CRN has been online only since 2005. There last been an essentially flat overall trend since then. A heat sink magnifies a trend. If there is no trend to magnify, there will be no divergence between CRN and COOP.

This does not falsify our hypothesis, it supports it.

Brandon R. Gates said...

Evan Jones,

... we do not adjust CRS in our current paper ... It is, of course possible that MMTS needs adjustment as well. But CRS Tmax is what is popping out, here. Even CRS Tmin is well up over MMTS for the unperturbed Class 1\2 set.

It's my lay opinion that the potential for unhandled warming bias in CRS is the most plausible mechanism you et al. have laid out.

It is possible that conversion might result in an upward offset, but one should compare the jumps with other MMTS stations. Apples to apples.

I'm confused again, but will sleep on that. Re: apples to apples, you've probably been asked "why no gridding when computing trends?" more times than you care to count, perhaps you could point me to a previous answer.

Anyway, if you apply a downward offset to CRS, that will have the exact same effect on trend as applying an upward offset to MMTS.

Perfectly clear.

Actually, I am saying something a lot more radical: I am saying that, offset adjustment aside, CRS needs a trend adjustment. A downward one. As in "warming the past" all the way back to 1880.

I would like to see you guys pull that off and have it be ... not correct ... but demonstrably less wrong than what is presently being done. I think that would be not only scientifically interesting, but politically as well. I also think it's also very tall order.

Your inference is correct.

First time today according to someone who is not me. I look forward to reading the paper and getting my grubby paws on the data. Cheers.

Kevin O'Neill said...

Evan - compared to the USCRN the CRS read too high and the MMTS read too low. This is what the literature has pointed out.

But USHCN currently matches USCRN quite well. So to maintain the match, if one is set of stations is lowered, then another set needs to be equally raised.

Now, your claim is that set xxx should be lowered. Which of the remaining then need to be raised? Do you see the problem? Isn't it *more* likely that the set you've isolated needs to be raised and many of the others (CRS) lowered?

This is the logical conundrum that I've seen all along. What you're proposing is the opposite of what logic and the scientific literature tells us needs to be done.

Evan Jones said...

I hope you do "mess" with the trend, when you correct an inhomogeneity the trend changes.

Yes.

The only way not the change the trend would be to apply the correction before and after the break, but that is useless, then the non-climatic change is still there.

Yes.

We Menne's offset to all MMTS stations (even though around half show no significant jump). That shift on offset (+0.0375 to Tmean) does affect trend.

But Menne speaks of a 15-year pairwise, and that amounts to a direct trend adjustment. That's what I mean when I say "messing with trend". Those results are a legitimate finding, but they are applied to adjust the wrong set of stations. That creates a not-too-large, but systematic data error.

Evan Jones said...

I'm afraid that instead you mean the new Star Wars VII movie's planet destroyer that sucks energy from the Sun and then blasts it all out when triggered.

Of course not. Only a very early prototype test version, back when it was intended by the republic as a green energy program.

Evan Jones said...

You forgot to mention a reason why. If the screen does not change from a CRS to an MMTS, but stays an CRS over the entire period, that removes one known cooling bias in the US temperature record.

If there are cooling and warming biases, they should be treated separately if at all practicable. TOBS is a cooling bias. Some of the moves (esp. to AP in the 1950s) is a cooling bias. Both of these are offset breakpoint issues. If metadata is not available or is unreliable, it is inferred.

Well, the past has been cooled before. I won't argue the necessity. But here is an equally compelling reason to warm it back some, not as an offset, but in the form of a gradual trend reduction.

I think we would wind up with a result looking a little like Haddy2.

That makes the trends of stations that are still a CRS more reliable.

I don't think Tmax is reliable, prima facie. Even Tmin trend is higher than MMTS.

Steve Bloom said...

Thanks, Victor. I did mean at the location of the sensor. Obviously fully characterizing the characteristics of a heat sink would require more than that.

Asked n-g about this years ago, EJ, but got an obtuse answer.

"But we would need about 20 more years of CRN data to get meaningful results."

Given the CRN pairing with high-quality USHCN stations (part of the siting criteria) and that this is a double-check rather than a determination of trend based on CRN-only, I would say it's much more a matter of not liking the obvious (and meaningful) results of such a comparison. Your pal Christy was extremely happy to noisily tout an alleged satellite data discrepancy based on less data than that, note, and AFAIK the length of the record was never a major point of criticism. And consider ARGO data, considered highly valuable by all despite the shortness of the record.

That comparison could have been done at the very start of the project and I was among those who pointed it out, and indeed it was done then, at least informally (the details of all of this are fading in my memory as it's been years since I focused on it). Even the minimal CRN data available at the time was sufficient for a double-check.

Evan Jones said...

[UPDATE. I forgot to mention the obvious: After homogenization the trend Watts et al. (2015) computed are nearly the same for all five siting categories, just like it was for Watts et al. (2012) and the published study Fall et al. Thus for the data used by climatologists, the homogenized data, the siting quality does not matter. Just like before, they did not study homogenization algorithms and thus cannot draw any conclusions about them, but unfortunately they do.]

Well, I didn't forget to mention it. It's one of our main arguments.

--Before homogenization, the well sited station trends are much lower than poorly sited.
--After homogenization, the poorly sited station trends are essentially unchanged.
--After homogenization, trends are nearly the same for all five siting categories.

Thus for the data used by climatologists, the homogenized data, the siting quality is of critical importance, and demonstrates what happens when a systematic error is present in the data when it is homogenized.

You described the effects of homogenization in an earlier comment, and they were what we expected, what we found, and what we have been saying.

Victor Venema said...

Common, Evan Jones, you are smarter than that.

Your own results show that the "perturbed" stations have a smaller trend than the "unperturbed" stations. In other words that the inhomogeneities cause a cooling trend bias in the USA.

You know that there are two main reasons for this, the Time of Observation bias and the MMTS transition, which both are reasons why raw observations are cooler now than the were in the past. Something that needs to be corrected for because we are interested in the change of the climate and not in the change in the way temperature was measured.

Thus, you can only agree with me that the removal of these inhomogeneities has to result in stronger temperature trends in the USA. That the trend becomes stronger after homogenization is normal. Do you agree with this?

Evan Jones said...

Yes, the TOBS is taken into account with USCRN.

I hope not. With hourly records there is no need.


If we agree the USCRN is most accurate, then why aren't you applying the correct adjustment to the MMTS to match the USCRN PRTs?

We definitely agree CRN is the best sited. I have heard vague rumors of a few Class 3s, but of the ten or sol I have looked at are all Class 1 by a large margin.

But I am not so sure those jumps are valid. When I looked at pairwise earlier on, like Menne, I saw no such evidence of jumps anywhere near that size. I will have to do a reasonable pairwise during followup.

For now, we are basically responding to Menne. But with the Excel sheets I will be providing, just drop in whatever MMTS adjustments you like (cut crude or station by station). H&L or whatever.

Evan Jones said...

Your own results show that the "perturbed" stations have a smaller trend than the "unperturbed" stations. In other words that the inhomogeneities cause a cooling trend bias in the USA.

For the Class 1\2s, certainly. Perturbed Class 2s clock in at 0.126C/d, while unperturbed show 0.204.

But I notice that the entirely raw 1218-station USHCN is 0.218C/d. That is even a little bit higher than the unperturbed Class 1\2s.

The reasons for both of the above:

--Major Bias, perturbed Class 1\2s: TOBS.
Note: Microsite is "adjusted" for by dropping the perturbed stations.
Result: An inferred -0.078C/d TOBS effect on CONUS (1979-2008).

--Major Bias, unperturbed Class 1\2s: None.
Note: Unperturbed Class 3\4\5 trend is 0.319C/d
Result: An inferred +0.115C/d Microsite effect on CONUS (1979-2008).

--Major Bias, entire USHCN (unadjusted): TOBS and Microsite.
Note: One is a cooling bias, the other a warming bias. They effectively cancel each other out. So the result is accurate (though only by chance).
Result: Adjustment from 0.218 (already a little over the level of Class 1\2 unperturbed set) to 0.324C/d.

One might say the homogenized result is more precise, yet less accurate.

You know that there are two main reasons for this, the Time of Observation bias and the MMTS transition, which both are reasons why raw observations are cooler now than the were in the past. Something that needs to be corrected for because we are interested in the change of the climate and not in the change in the way temperature was measured.

Yes. (TOBS being the biggie.)

Thus, you can only agree with me that the removal of these inhomogeneities has to result in stronger temperature trends in the USA. That the trend becomes stronger after homogenization is normal. Do you agree with this?

A complex but interesting question.

If an accident victim has four potentially fatal injuries, treating only two of them will not suffice. (That may even make things worse.)

TOBS, moves, and equipment conversion require adjustment, yes. But there is also evidence there also needs to be an adjustment for microsite. It is a problem on the scale of TOBS.

And there is also the issue of a need for CRS adjustment going all the way back to the 19th century (also heat-sink related and probably exceeding the conversion offset for MMTS).

Treating only two of the symptoms will result in a dead patient. Even "deader" than if you left him lying there.

But if you treat all four (TOBS, Microsite, MMTS offset, CRS trend), then the patient can survive and prosper.

As it is, there are four biases. Two of them cooling (TOBS and MMTS offset), and two of them warming (Microsite, CRS trend). Only the two cooling biases are accounted for and neither of the two warming biases are accounted for.

If the entire USHCN is left unadjusted, the results are much the same as Class 1\2 Unperturbed (0.204C/d vs. 0.218C/d). With the adjustments you apply, there is a large trend increase (0.324C/d). So after only a partial adjustment, you are further away from the Unperturbed Class 1\2s than you were before adjusting.

Bottom line: When homogenized, the Class 1\2s (both perturbed and unperturbed) are adjusted way up (because they are few). The poorly sited stations are hardly adjusted at all (because they are many).

What should be happening -- in terms of TOBS and Microsite -- is:
--The Class 1\2 unperturbed trends (0.204) should not be adjusted (except for incidentals).
--The perturbed Class 1\2 trends (0.126C/d) should be adjusted ~0.078C/d warmer, net.
--The unperturbed Class 3\4\5 trends (0.319C/d) should be adjusted ~0.115C/d. cooler, net.
--The perturbed Class 3\4\5 trends (0.190C/d) should be adjusted ~0.014C/d warmer, net.

And that is not what is happening.

Evan Jones said...

It's my lay opinion that the potential for unhandled warming bias in CRS is the most plausible mechanism you et al. have laid out.

Thank you. It would probably result in a downtick in overall 20th-century LST trend by ~0.1C. Not so big. (Not so small, either.)

Science is very good at narrowing things down to a limited number of horses. But, having done that, there is a risk of putting your money on the wrong one.

Now if you take a step further, you will consider that both what is going on with the CRS may be related to heat sink and is therefore part and parcel to the microsite issue. Which in turn is merely an amplifier of what the ground is already doing. Very generic. It's all one patch, expressions of the same effect popping up in the proximate vicinity and the equipment.


I'm confused again, but will sleep on that. Re: apples to apples, you've probably been asked "why no gridding when computing trends?" more times than you care to count, perhaps you could point me to a previous answer.

What I mean is that if one wants to find out how much an MMTS conversion jump is, one is best served by doing a pairwise comparison at the point of conversion with other MMTS units already in place. Not with CRS units. Apples to apples.

This works fine if there are a lot of other MMTS units around. But it will obviously prove a problem if it is an early conversion. So, adjustment unfortunately necessary: CRS must be adjusted to conform with MMTS and then pairwise applied for offset jump. Apples to Pseudo-apples is the best we can do.

But look at the oranges separately insofar as possible.

Actually, I am saying something a lot more radical: I am saying that, offset adjustment aside, CRS needs a trend adjustment. A downward one. As in "warming the past" all the way back to 1880.

I would like to see you guys pull that off and have it be ... not correct ... but demonstrably less wrong than what is presently being done. I think that would be not only scientifically interesting, but politically as well. I also think it's also very tall order.

I appreciate that. If one is generally optimistic, it's easy. But if one is pessimistic, one is inherently torn between the desire to be wrong and to be right. The desire to be right is strong and powerful (and should be).

It is a tall order. And that's what-for followup. There are any number of outstanding issues. There is the physics to consider (in terms of the equations, we do descriptions just fine). I'd like to work in partial records (requires pairwise for anomaly startponts). I'd like to use that to do a roll-your-own MMTS offset evaluations (also requiring pairwise). I want to consider both a Microsite and CRS adjustment. I'd like to look at a refined and numeric upgrade of the Leroy method, keyed to trend rather than offset (and maybe that, too). Vague dreams of a world tour via the GHCN express. Etc., etc.

Your inference is correct.

First time today according to someone who is not me. I look forward to reading the paper and getting my grubby paws on the data. Cheers.


And I look forward to your finding at least one error or point of dispute. We live and learn. A most pleasant post. happy New Year.

Evan Jones said...

Evan - compared to the USCRN the CRS read too high and the MMTS read too low. This is what the literature has pointed out.

That is an offset consideration. Could be right, too. As it is, we do add an offset to MMTS.


But USHCN currently matches USCRN quite well. So to maintain the match, if one is set of stations is lowered, then another set needs to be equally raised.

This is more of a trend issue. Since CRN came online in 2005, there is an essentially flat trend. In the case of a flat trend, neither microsite nor the CRS box will cause a divergence.Therefore COOP trend would track CRS trend well. The only reason they diverge between 1979 to 2008 is that there was an overall warming trend over that period.


Now, your claim is that set xxx should be lowered. Which of the remaining then need to be raised? Do you see the problem? Isn't it *more* likely that the set you've isolated needs to be raised and many of the others (CRS) lowered?

Since 2005, pretty much nothing should be either raised or lowered. Regardless of how much offest is added to MMTS, a trend adjustment must be applied to all CRS boxes going back to 1880. That would mean a "warming of the past" for CRS prior to the "pause", but no adjustment after the onset of the pause because in a flat trend, there would be no trend adjustment.

It's not a contradiction.

To be clear:

1.) Arguably, an offset adjustment for MMTS must be applied. A jump (a variable one, depending on pairwise comparison. An upward one. (But I'll be doing a pairwise to check the amount.)

2.) I am also saying a trend adjustment for CRS needs to be applied. A gradual reduction of trend.

This is the logical conundrum that I've seen all along. What you're proposing is the opposite of what logic and the scientific literature tells us needs to be done.

It isn't illogical, if you look at it. If MMTS offset is found to be spuriously low, and the calibrators were all wearing their hats to one side, the apply the offset. If CRS trend is found to be too high, reduce CRS trend, taking it back to the onset of the CRS record.

The literature does not advocate that. But I think logic does.

Evan Jones said...

Interestingly, it did not escape H&L's notice that CRS needs the hairy eyeball.


Likewise, our conclusion suggests that the LIG temperature records prior to
the MMTS also need further investigation because most climate researchers considered the MMTS more accurate than the LIG records in the cotton-region shelter due to possible better ventilation and better solar radiation shielding afforded by the MMTS (Quayle et al. 1991; Wendland and Armstrong 1993).

Brandon R. Gates said...

Evan Jones,

A most pleasant post. happy New Year.

Same to you. Did you have any comment on arithmetic averaging for trends vs. gridded averaging in the name of apples to apples comparisons?

Steve Bloom said...

"Since CRN came online in 2005, there is an essentially flat trend."

Ahahaha. No. Lots of year-to-year variability even with cherry-picked end points. So much data, so much for you to ignore.

Kevin O'Neill said...

KTO: "Yes, the TOBS is taken into account with USCRN."

Evan Jones replies: "I hope not. With hourly records there is no need."

Evan - if we want to compare two different measurements we want to be sure we're comparing apples to apples. Again you display a fundamental misunderstanding. For technological and historical reasons, mean daily temperature over land has been taken to be the average of the daily maximum and minimum temperature measurements.

We can/could use every USCRN temperature recorded during the day to compute an average - but then this result is *not* suitable for meaningful comparison to other temperature datasets. It's comparing apples to oranges. It does not give confidence or credibility that you can make such simple mistakes. This is the whole point of TOBS corrected measurements - to provide consistency in method so that we can derive accurate trends over time.

BTW, USCRN actually samples at a 5-second rate and the data output is 5-minute averages of the samples.

The differences in using Tmin and Tmax versus an arithmenic mean for all daily data recorded is explored in Sampling Biases in Datasets of Historical Mean Air Temperature over Land, Wang 2014 (doi:10.1038/srep04637).

Evan Jones said...

Again you display a fundamental misunderstanding. For technological and historical reasons, mean daily temperature over land has been taken to be the average of the daily maximum and minimum temperature measurements.

Yes, I said so before either here on on jc). That is what I have seen in the USHCN and GHCN. (But much data, e.g., ASOS, is recorded hourly, I seen examples.)

By apples-to-apples, I mean I want to do is compare MMTS conversion jumps with other MMTS units to find out what that jump really is (net). MMTS units warm slower. That is directly combated by NOAA, which uses years of pairwise to bring MMTS inline with CRS, not the other way around, as they should be doing it. (Even H&L says MMTS beats LiG.)

This is the whole point of TOBS corrected measurements - to provide consistency in method so that we can derive accurate trends over time.

Sure. But no one is questioning TOBS (yet). I largely bypass the issue by dropping stations with TOBS flips, removing ~90% of the bias. I am leery of the amount of the NOAA adjustment (I think it is in the correct direction, but too large), so I prefer to avoid it, but if you apply the residual TOBS data to our unperturbed set, the results are only a little different and still significant.

Come to think of it, has anyone looked at whether looking at an AM-only station and PM-only station trends are the same? If Tmax and Tmin are not rising at the same rates, perhaps not. (Assuming no TOBS change), of course. Might be worth a closer pass.

Evan Jones said...

Ahahaha. No. Lots of year-to-year variability even with cherry-picked end points. So much data, so much for you to ignore.

There is always year-to-year variability and a whole slew of intermittent variables, as well. That's why longer trends are needed. That's why a 30-year trend is a good standard for ferreting out the cumulative effects of a minor year-to-year change over time. Like Microsite. Like AGW, for that matter.

We use show subsets (1979-1998 and 1999-2008), the earlier and later portions of our full time series. They support our hypothesis, but as they are shorter, they are less reliable. But the former demonstrates that our 30-year results are not an artifact, and the latter shows the poorly sited stations cooling faster than the well sited. But the amounts are more questionable than for the 30-year set.

Second of all, we do not cherrypick our start or endpoints.

We began this study from 1977-2006, and those results were more favorable to our hypothesis than the ones we switched to (1979-2008), especially for Fall et al. But we made the change so we could compare data with the satellites. That was before I even started looking at Leroy (2010). It seems to me that carefully looking for 4-year stretches where data does not diverge may be more in the land of the pick. I think a 10-year stretch is a Meager Minimum, and the trend must be pronounced. And the amounts will be an issue, even then.

PaulS said...

Evan,

It seems like you don't appreciate or haven't properly considered the implications of your own hypothesis. For CONUS, inter-annual variability is about the same size as the 30-year trend. If there is some "heat sink" object which is amplifying the actual change in absolute temperature as much as 0.3ºC relative to a 0.6ºC warming, then that effect should likewise show up due to inter-annual variations. But there is no correlation at all between inter-annual temperature variance and bias variance.

The reason why you need longer timescales to get some idea about the magnitude of the AGW component of temperature is because of natural inter-annual and decadal scale variability obscuring the signal. That's not a relevant consideration for your "heat sink" proposition, since the cause of temperature change is unimportant. That it isn't relevant can be seen empirically by the small variance in the divergence (see the yellow plot on Victor's update).

There is no reason why your "heat sink" effect wouldn't show up in inter-annual changes and it doesn't.

You also still haven't addressed a point made numerous times, that the divergence opens up between about 1981-1996 yet the temperature trend is flat over this period.

I think it would be best if you dropped this "heat sink" stuff - it's demonstrably wrong. Essentially what this is all about is yours and Watt's belief that homogenisation is wrong. Showing that to be the case will require actually getting to grips with the mechanics of homogenisation algorithms, testing them against various situations and somehow demonstrating that they make spurious adjustments.

Victor Venema said...

Paul, that is a really good way of stating the problem. That the difference time series is so smooth rules out any explanation that responds to the weather, such as The Heatsink.

Evan, you seem to be so sure nothing has changed for your "unperturbed" stations, is that the typical overconfidence found on mitigation sceptical blogs or do you have photo's for each of these stations in 1979?

If there were a gradual inhomogeneity that makes Cotton Region Shelter warm faster than Automatic Weather Stations (MMTS), we would know that from parallel measurements. For example, the parallel measurements between USCRN stations and nearby MMTS and CRS stations from the normal US network (COOP).

You cannot start again with your heat sink as amplifier, the heat capacity of a Cotton Region Shelter is ever smaller than that of a building. (And again it would dampen variability, not amplify it.) Furthermore, many such parallel measurements have also been made during periods in which the temperature was rising faster and a trend difference of a whooping 1°C per century would have been noticed, but was never reported.

PaulS said...

MartinM

I don't think so. If only those shifts which were statistically significant were adjusted, then the unadjusted transitions of the next sentence are surely the non-significant ones only, not the whole lot.

Yes, I think your interpretation fits much better.

One clear red flag concerning the adjustment performed by Watts et al. is that the "unperturbed" MMTS average Tmin trend is double Tmax trend despite reports of increasing surface solar radiation during the study period.

This is exactly what we would expect to see if MMTS conversion adjustment is inadequate.

Kevin O'Neill said...

I created a 1-year synthetic daily mean temperature series approximating the average annual temperature variation one might find in a location similar to Little Rock, Arkansas. I then added a fixed trend of 0.01C per year and a day-to-day normally distributed random variation of 1.1C. I then extended the series over a 30 year period.

Next I created a 2nd series that used the first as the starting point, but *subtracted* the MMTS Bias I adjustment using the coefficients of Hubbard & Lin, 2004. This essentially simulates raw MMTS data that directly correlates to the 1st series per Hunnard & Lin 2004 (though ignoring MMTS Bias II).

I then ran a Monte Carlo simulation that randomly chose Jan 1 of a year from 1983 to 2000 as the conversion to MMTS. And spliced the data prior to the conversion from series one and the data after the conversion from series two.

To be clear, the first series had a fixed trend of 0.01C per year. The second series had the same trend, but the mean was lower because of the MMTS Bias I reverse adjustment. The MMTS Bias I adjustment averaged 0.26C over an entire year, but because of the temperature dependency it ranged from 0.23C to 0.32C.

The effect on trend was significantly dependent on conversion date;
If the conversion year to MMTS was 1983, the uncorrected MMTS readings reduce the overall trend by 60%
If the conversion year to MMTS was 1985, the uncorrected MMTS readings reduce the overall trend by 85%
If the conversion year to MMTS was 1987, the uncorrected MMTS readings reduce the overall trend by 100%
If the conversion year to MMTS was between 1988 and 2000, the uncorrected MMTS readings turn a positive trend into a negative trend with the largest effect on trend occurring with a conversion to MMTS in 1995 (-30%). I did not look at conversion dates after 2000.

I do not find these results at all surprising. Unless one fully adjusts for the MMTS systemic bias the effect on trend is pronounced. The full trend is not recovered unless the full adjustment is made. This is a simple mathematical tautology. What it does show is the magnitude of the possible effect on trend by underadjusting for the MMTS bias.



Victor Venema said...

Kevin, nice and expected result. (What was your 30-year period? I had expected the largest effect of the break in the middle.)

Kevin O'Neill said...

Victor - To make a useable comparison to Watts et al's poster, I imagined the 1st year as 1979 and the 30th year as 2008. Since it's just synthetic data, the year numbers are arbitrary and the trend was chosen just so the percentage errors would be obvious. I maintain the analogy by restricting the first full possible MMTS conversion year to 1983. Thus every spliced series contained the first 4 years (1979 - 1982) of original data.

It wasn't intuitive (to me at least) which conversion year would have the largest effect; it turned out to be 1995 -i.e., a spliced series with 1979 thru 1994 with original data and 1995 thru 2008 with raw MMTS data. I suspect the size of the fixed trend in relation to the adjustment magnitude is a factor in how many years until the largest negative impact on the true trend is seen. That's really just a WAG, though :)

I'll rerun it tomorrow with a couple of different fixed trends to see if this true or not.

I was a bit surprised that daily trends - even over a 30-year period - are all over the place, but monthly trends match the fixed trend very well despite the wide variation in daily trends. Daily trends easily varied at least -0.05C/year to +0.05C/year (+/- 500% of true trend), but monthly trends typically showed less than 5% error.

At the same time, I knew intellectually that simple changepoint analysis could never detect an MMTS magnitude break by looking at daily data - the day-to-day natural variation is too large (and I used a very conservative number for that variation here). But even knowing this intellectually, it still surprised me how little chance one had of detecting a changepoint based on a single station's daily data. Maybe it would have a better chance with monthly means.

I proved the power of homogenization to myself many years ago using Hansen and Lebedeff's "Global trends of measured surface air temperature," 1981. I've never questioned it since. This little experiment simply served to reinforce the need for homogenization (vis a vis MMTS conversion) unless one has complete data on date of conversion *and* is willing to use it to perform the complete MMTS Bias I and II adjustments.


Kevin O'Neill said...

Victor - short answer to your question: Considering the first 4 years always have the original data, the middle is really 1995; the same year that had the largest deleterious effect on the true trend. So your supposition was correct :)

MartinM said...

The trend of a step function is at a maximum when the step occurs exactly in the middle. And because linear regression is...well, linear, the trend of the sum of two series is equal to the sum of the two trends. So yes, assuming my maths isn't even rustier than I thought, the maximum effect of conversion should occur at the mid-point.

I'd be curious to see what happens if you apply Evan's -0.1 °C/ +0.025 °C correction to your synthetic data, Kevin.

Kevin O'Neill said...

Martin - I did run a few series with an offset of 0.125C and the result was not encouraging. Not surprising since the average MMTS Bias I adjustment was 0.26C.

PaulS said...

Just went through USHCN stations for California and Nevada using the Berkeley site and can find no stations without multiple documented changes which would support the low 0.04K/Decade trend in raw data indicated by Watts et al. for that region. The only low trend station which might fit the bill as minimally perturbed is Santa Cruz (only one TOBS change), which has about a zero trend over 1979-2008. However, the trend in the final adjusted product for this station is also about zero - it isn't representative of the whole region, just this coastal zone.

dhogaza said...

PaulS:

"The only low trend station which might fit the bill as minimally perturbed is Santa Cruz (only one TOBS change), which has about a zero trend over 1979-2008."

And Santa Cruz sits right on Monterey Bay with all the climatic implications one might imagine.

Victor Venema said...

PaulS, wonderful work. That confirms that Watts and colleagues should have a look whether the data they call "unperturbed" is actually unperturbed.

Evan Jones said...

I will reply to the above, but I am running some numbers. I will be back in a "while", however long that is.

JN-G has just vetted the unperturbed list looking for jumps and we have skinned out a few. Note that he has run the trends independently, using his own methods, and we match very closely.

(He originally had -0.695C/d for Region 9 trend, but my results were +0.040C/d. That was because he was not anomalizing the data. When he did so, our results matched.)

Evan Jones said...

PaulS: I think it would be best if you dropped this "heat sink" stuff - it's demonstrably wrong. Essentially what this is all about is yours and Watt's belief that homogenisation is wrong. Showing that to be the case will require actually getting to grips with the mechanics of homogenisation algorithms, testing them against various situations and somehow demonstrating that they make spurious adjustments.

Homogenization is necessary, at least on the GHCN level. Other than aesthetics, my beef is not that homogenization, per se, is wrong, but that if a systematic error is present in the data, homogenization will yield inaccurate results. If a systematic error is not present, then homogenization will work as intended.

No one here, I think, would disagree with that. The question we are discussing is whether or not Watts et al. has identified a systematic error, and if there is a systematic error, is it or is it not correctly attributed.

Currently, we bin by demonstrably good vs. bad microsite. A statistically significant divergence appears. That is what needs to be explained.

Evan Jones said...

The only low trend station which might fit the bill as minimally perturbed is Santa Cruz (only one TOBS change), which has about a zero trend over 1979-2008.

Santa Cruz is not included in our unperturbed set (because of said 1987 TOBS flip).

If you cannot find any Region 9 (West) low-trend stations, perhaps it is because you are looking at fully adjusted rather than raw or TOB-only data. That is the only explanation I can suggest. (My results and JN-G's results are the same.)

Victor Venema said...

Evan Jones: "if a systematic error is present in the data, homogenization will yield inaccurate results. If a systematic error is not present, then homogenization will work as intended. No one here, I think, would disagree with that. "

Well. Actually. I disagree with that.

The reason to do homogenization is to guard against systemic trend errors. If they are there, homogenization will reduce them.

The USA happens to have quite a large trend bias in the raw data and homogenization consequently makes a large adjustment to the trend.

The USA has a time of observation bias problem, which now seems to be accepted by the WUWT community. You can correct this bias with explicit corrections based on simulating the effect using hourly measurements. And if you do not do this, the relative homogenization method of NOAA takes care of this systematic trend error.

This immediately shows that in a dense network like the USA homogenization works pretty well in removing trend biases. If you mean with "inaccurate" that homogenization is not perfect, I agree with that. Everything has uncertainties. Certainty can only be found in religion.

Evan Jones said...

Kevin O'Neill.

Let us consider UAH and RSS trends (minus the 10% for surface as per Klotzbach, 2009).

-- This matches our unperturbed Class 1\2 subset trends very well (amazingly well?).
-- The unperturbed Class 1\2 subset includes both MMTS and CRS stations.
-- CRS have higher trends than MMTS trends.
-- Conclusion: For that subset, MMTS trends are too low and CRS trends are too high.
-- Solution: Adjust MMTS trends up and CRS trends down.

Evan Jones said...

The reason to do homogenization is to guard against systemic trend errors. If they are there, homogenization will reduce them.

Agreed.

The USA happens to have quite a large trend bias in the raw data and homogenization consequently makes a large adjustment to the trend.

Agreed (microsite considerations aside).

The USA has a time of observation bias problem, which now seems to be accepted by the WUWT community. You can correct this bias with explicit corrections based on simulating the effect using hourly measurements. And if you do not do this, the relative homogenization method of NOAA takes care of this systematic trend error.

Agreed.

This immediately shows that in a dense network like the USA homogenization works pretty well in removing trend biases. If you mean with "inaccurate" that homogenization is not perfect, I agree with that. Everything has uncertainties. Certainty can only be found in religion.

Agreed. However, if a station net is dense enough, one may choose to drop biased stations and still maintain adequate coverage. That applies to USHCN. But GHCN, not so much -- homogenization is vital to maintain what coverage you have, despite decreasing efficiency due to said poor coverage. For large geographical areas of GHCN, any station that can be "saved" must be saved if at all possible. For the USHCN (and other areas of heavy coverage, this is not as important a consideration.

Now if you include microsite bias in your considerations and adjust for that using pairwise to well sited stations (including poorly sited stations after having been adjusted for microsite, either observed or inferred), then homogenization will perform as intended.

When homogenizing, however, there is always a danger that one is failing to account for a significant systematic error, and these errors can be difficult to identify. That is what I think has happened here. It therefore needs further correction to operate accurately. For homognization to operate as intended, all major biases must be accounted for. (For that matter, they must be accounted for even if not homogenizing.)

In a dense network, dropping "bad" stations essentially does the same thing that homogenization does -- correct for major biases. It remains, as always, to correctly identify such biases. Dropping stations and not accounting for microsite yields similar results to homogenization and not accounting for microsite.

In a philosophical sense, dropping "bad" stations is as much an "adjustment" as homogenization, and of the same basic magnitude.

Evan Jones said...

Further comment:

Well. Actually. I disagree with that.

The reason to do homogenization is to guard against systemic trend errors. If they are there, homogenization will reduce them.


Unless identified, only if they affect a minority of sample or create large, identifiable jumps (such as TOBS).

If a gradual bias affects a majority of sample, then the bias will be treated as if it were not a bias. In such a case, unbiased stations will be corrected to conform with biased stations, which in turn will not be corrected. That is when Kindly Uncle H becomes the H-bomb.

(I must go back to my data. I will return, as time permits, to reply to any further comments. My thanks again for your input and that of the others here.)

MartinM said...

Victor: Well. Actually. I disagree with that.

I agree with your disagreement. In the absence of systematic errors, homogenisation is useful, but shouldn't have any real effect on global or regional trends, so long as we have enough data to work with. Systematic errors are what move homogenisation from 'nice to do if you have the time' to 'extremely important'.

Evan, it seems to me that a lot of questions about your work would be easier to clear up if we had access to your station list. Are you willing to share it at this point?

Evan Jones said...

I agree with your disagreement. In the absence of systematic errors, homogenisation is useful, but shouldn't have any real effect on global or regional trends, so long as we have enough data to work with.

Agree so far.

Systematic errors are what move homogenisation from 'nice to do if you have the time' to 'extremely important'.

Yes. Or some equivalent mechanism (we drop instead because with the good US station density we can afford to, much to the same general effect). But if a spurious gradual trend issue is affecting the majority, then homog brings the "clean" minority into line with that majority rather than the other way around.

If one homogenizes the data, one must account for that. It can, but currently does not. Identifying and accounting for any such biases is a key to the outcome of homog. (Same goes for any method including BEST methods and our "dropping" method, of course.)

Evan, it seems to me that a lot of questions about your work would be easier to clear up if we had access to your station list. Are you willing to share it at this point?

I can't do that until publication. Twice we shared preliminary data and twice we had cause to regret it. We are currently fining down the station list (with JN-G pushing and me pulling). I can tell you that our unperturbed list shows much the same trend for raw and TOBS-adjusted data, which goes a long way towards supporting the general accuracy of USHCN metadata.

When we do publish, all data and methods will be made available for full replication and review. I have made a set of Excel sheets that are easy to work with and make alteration of the inputs for alternative interpretation.

Yes, the ratings list is key. We'll be back for lots more discussion when we release. I look forward to the discussion.

Victor Venema said...

PaulS, for Watts et al. (2015) stations that only have an MMTS transition count as "unperturbed". How much work would it be for your to see how many stations are "unperturbed" by this definition? Information on this transition Watts et al likely took from the HOMR metadata database.

The definition of "unperturbed" is naturally wrong (at least as long as the correction applied is much too small), but it would still be interesting to see if they would have any stations in their dataset had they detected all inhomogeneities.

Evan Jones said...

Information on this transition Watts et al likely took from the HOMR metadata database.

Yes. (Plus some curator info.)

The definition of "unperturbed" is naturally wrong (at least as long as the correction applied is much too small), but it would still be interesting to see if they would have any stations in their dataset had they detected all inhomogeneities.

That is an interesting argument. You are assuming that a station has bad metadata by definition if the trend is low.

If that were the case, TOB data on our unperturbed set would likely be as high as the homogenized data. However, it isn't. It is within 0.02 of raw. And the digression between well and poorly sited stations remains roughly the same for both the perturbed and unperturbed sets. That suggests that HOMR metadata good.

Victor Venema said...

Evan Jones, that "curator info" provided additional breaks in itself shows that the HOMR metadata is not perfect. How well do you think the current curator still knows what happened to the station in the 1980s? I think you know the answer, that is why you did not start in 1900.

If the "curator info" provided a sufficiently large number of additional breaks you could make two sets of metadata (HOMR and HOMR#curator), the metadata quality difference this would allow you to study how sensitive your result is to missing metadata.

I do not understand much of the second part of your answer. What has NOAA's tested an validated TOB corrections to do with your newly invented and too small MMTS corrections?

But I would say that I am "assuming" that if the metadata is not perfect, your methodology has problems. Maybe you can show that this problem is sufficiently small for the conclusions you are drawing, but there is a potential problem.

My argument you cite actually was that if you only corrected a tiny part of a break (like you do for the MMTS breaks) that there is still a break after correction. I would not call that "unperturbed".

PaulS said...

Victor,

I'm using the Berkeley Earth interpretation of metadata (presumably HOMR) as depicted on individual station pages. They use four classifications of continuity break - 1) Station Move, 2) Record Gap, 3) TOBS change, 4) Empirical break.

MMTS conversions seem to be classed as station moves. My classification of unperturbed therefore allows for one station move, although for a given station it may or may not be MMTS conversion. I also allow stations with record gap discontinuities to be included, though this may be lenient since there's typically a reason why a station goes silent and then returns - i.e. something has changed at the site.

I've put together a spreadsheet showing all stations in California and Nevada with raw and final 1979-2008 trends, calculated from monthly anomalies using the Berkeley data for both. On the line of each station are noted station discontinuities according to the Berkeley interpretation of metadata. Empirical breaks are not included since I assume these are undocumented.

At the bottom of the spreadsheet I have put together a subset of "minimally perturbed" stations, according to my classification in the second paragraph above. You may note there are actually a couple of stations with negative raw trends, whereas I indicated earlier that there were no such low trends in unperturbed stations. In my previous scan through I didn't include these because they don't report over the full period - one ends in 1997, the other 2002. I assumed Watts et al. would reject such stations for this study, but I include them here for completeness.

Kevin O'Neill said...

Evan - I have no idea why we would consider satellite temperatures in this discussion - none at all. We're supposedly interested in surface temperatures. The popular RSS and UAH TLT satellite datasets have a multitude of unique problems, do not measure surface temperatures, and cannot be compared to surface temperatures without making *a lot* of assumptions.

Otherwise, adjusting MMTS up is definitely a start, but your adjustment - at least as you've laid it out so far - is woefully short of the actual bias. Do the actual Hubbard & Lin recommended MMTS Bias adjustments and you'll see that for yourself.

As I pointed out upthread - for a climate normal similar to Little Rock Arkansas the Bias I adjustment averages 0.286C, varying by time of year from 0.23C to 0.32C.

Evan Jones said...

Evan Jones, that "curator info" provided additional breaks in itself shows that the HOMR metadata is not perfect. How well do you think the current curator still knows what happened to the station in the 1980s? I think you know the answer, that is why you did not start in 1900.

I didn't say curator info provided breaks. And I know USHCN metadata is not perfect. Some curators are more reliable than others. They tend to be punctilious and have good memories of their stations.

And start in 1900? Durn tootin' I know the answer. USHCN metadata is crap before 1950 and doesn't get much better until the 1970s. The only way we could possibly do this the way we are doing it (i.e., dropping) is during a period of good metadata.

I do not understand much of the second part of your answer. What has NOAA's tested an validated TOB corrections to do with your newly invented and too small MMTS corrections?

I have looked at the MMTS data. It is too low. I have looked at the MMTS data. It is too high. The net is right about where I have it: under 10% below the UAH/RSS average (A bit higher than Klotzbach, 2009). The logical approach is to adjust both MMTS and CRS to that.

But I have gone about it the wrong way; my errors are canceling each other out. I added a jump. But, upon careful reading, the jump was not a jump at all. Having looked at the data, there is not a jump, but a prolonged divergence. So Menne is trend-adjusting and not jump-adjusting at all, and he's using CRS as touchpoint.

Very well, we will trend-adjust, too. And when we are done, we'll have some uncomfortable suggestions for the 1880 - 1980 record. Like I say, those guys have correctly identified a problem and are diligently at work making it worse.

Evan Jones said...

Evan - I have no idea why we would consider satellite temperatures in this discussion - none at all.

There is every reason. They are active throughout the study period. They have full and complete coverage. They also (unlike CRU) never let their old raw data degrade and document their changes. No gaps, No TOBS issues, no moves, no siting issues (other than correctable drift).

I'd rather use CRN, sure, but that net wasn't operational until 2005.


The popular RSS and UAH TLT satellite datasets have a multitude of unique problems,

As opposed to surface stations? And most of those sat. problems are far north and far south of CONUS, which is a sweet spot for sats. There are two major metrics that use different methods and instruments. Yet they track very much the same. They also track radiosonde trends to 98% confidence.

do not measure surface temperatures, and cannot be compared to surface temperatures without making *a lot* of assumptions.

Assumptions as per Kotzbach and others (Christy has worked on this). Annual satellite data has a bit less range than surface. So one must address trend rather than offset. Over seas, sats exceed surface by as much as 40%. Over land, by 10%. There's your baseline.

All of the above is something you can at least hang your hat on.

The alternatives are Menne and L&H.

The L%H-04 study involved two pairs of instruments over one year. It dealt solely in offsets. But the USHCN shows no offset jump whatever, it is a continual divergence. Interesting, but no sale. If CRN went back to 1979, or at least 1986, maybe we could get somewhere. But it don't so we cain't.

Menne takes are more applicable but misdircted approach. He does a pairwise with CRS and then goes out of his way to mention that the adjustments do not quite bring MMTS to the level of CRS, so maybe he should by all rights be adjusting them some more. He's going after the kid playing with chalk on the blackboard while ignoring the brawl in the back of the class.

I find MMTS to be tracking ~0.40C/decade too low (for stations at least 50% of their tenure as MMTS). But CRS, choking on its own personal heat sink, is severely overtracking by the much larger amount of 0.11C/d.

Fix both.

Victor Venema said...

MJX, you may mean well, but I prefer not to publish your comment. It is rather unfriendly and does not contain arguments (for or) against the Watts et al. (2015) manuscript.

Rest assured that I am not buying any of the conclusions of this work and have a hard time believing that Evan Jones does, but humans are capable of extraordinary feats when their identity is challenged.

Evan Jones said...

Thanks, VeeV, but don't fret MJX. He follows me about and says what he says. But I've learned from him, too, as well as from others who criticize this project.

I find MMTS to be tracking ~0.40C/decade too low

Correction: Make that 0.04 per decade too low.

Rest assured that I am not buying any of the conclusions of this work and have a hard time believing that Evan Jones does, but humans are capable of extraordinary feats when their identity is challenged.

Well, I do buy it and furthermore we are essentially the ones doing the challenging.

JN-G is buying it, too, and he was as skeptical of it as you were when we started out, especially After the Fall (2011, that is). He ran it all using his own independent method and his initial results were lower than mine (when he anomalized, our results came together). He experimented with infill, and we have both run the numbers using TOBS-adjusted data. All of that comes quite close to what we have found.

We are getting a statistically significant difference of better than 99%. Even if the metadata and the rating system are not perfect, that is a loud alarm bell that something gradual and systematic is amiss.

If it is not microsite, then it is something that happens to coincide with microsite past the point of significance. For that matter, homogenization is far from perfect, as well: with an essentially good sample, it improves the results; if the sample is permeated by a factor causing gradual divergence, it leads the results awry unless keyed correctly to the problem. It can't be MMTS, as the stations with poor microsite are impacted by that more than the well sited sample.

On the other hand,you have dragged me over (sputtering and protesting all the way) to the pov that homogenization is a necessary tool for what you are trying to do. I still say yuk, ick, and other comments, but that wouldn't be the first time I found myself forced to accept the potential benefits of something I didn't like. And the potential benefits are there (as are potential pitfalls).

I have "homogenized", in an informal fashion, much data when designing Blue Vs Gray and other "simulation" games. But it is old hat that a game designer with perspective tends to josh at the "realism" many players "find " in these games. They are, after all, not only models, but models strictly constrained by the "fun factor".

But this is science and statistical analysis and goes far beyond what one is trying to achieve with a game.

Evan Jones said...

I just noticed this:

The reason these weaknesses were not relevant for Fall et al was that, as you mention, the conclusions were not based on the weakest part of the study.

Heh. Metadata (or in our case, not accounting for it at all) was not relevant for Fall et al.?

I "project" that if we had had strong Tmean results, the critics would have been howing from the rafters on account of those very same "weaknesses" -- and well they should have done, at that. And youse guys are doing precisely that, producing what is -- for me, anyway -- a fascinating discussion. We get to know how our opponents think, what makes them tick, get focused on criticisms that we should have been more focused on in the first place.

And, besides, I wonder ...

We used over 1000 stations in Fall et al (and Watts prerelease 2012). I will be running a Leroy (1999) version using our current "cleaner" dataset. And if the siting differences pop out at this late date, I will be having one very damn big scientific horse-laugh. Because it was on the urging of Doc. Con, you, and other critics who pointed out that these issues needed to be addressed, not merely mentioned (as they were in the 2012 paper) and put off for followup.

So if those changes alone wind up producing results that support the hypothesis better than Fall et al., it will be an irony, indeed.

You could naturally draw similar conclusions and add that the new Leroy classification seems to work better than the old one.

Leroy (1999)accounted only for distance from heat sink. Leroy (2010) accounts for both distance and area. (The latter is more complex and somewhat Byzantine. When you try it, you'll see.) Leroy (2010) is an improvement, but obviously not the final answer. I'll be taking a crack at it, myself, I think.

Note that wind is a variable. Winds in any particular area do not necessarily prevail in the same strength and direction from year to year. If winds blow from the sink towards to sensor, the effect will likely be increased. If next year it blows away from the sensor, the effect would be diminished. To say nothing of the seasonal variability of wind, thus affecting HSE effect on sensors during different seasons.

Therefore just as temperature trends (a gradual effect) require at the very least, 10 (and preferably 30) years of data to note the effect, the same applies to heat sink, which is a smaller effect than the warming itself, in any case. Shorter periods than that are subject to too much Variable Variability, as it were, and are therefore suspect and all too often subject to the perils of the cherrypick.

I could argue (wrongly), as do some, that the 1947-1952 period "proves" CO2 has no effect. I could argue (also wrongly) that the 1994-1998 period proves CO2 is about to fry us in our beds. So you need a longer trend. Our shortest is 10 years (at the end of our study period) and first 20 (from the start of our study period). One 30-year and one 20-year period of warming, and one 10-year period of cooling (strong, sustained cooling periods are hard to find thanks to AGW).

Oh, and by the way, I think the cooling period from 1947 to 1976 is exaggerated both due to overall poor microsite and the wooden boxes of the CRS units. Our hypothesis works both ways. What goes up must come down.

Victor Venema said...

Fall, Watts et al. showed that after homogenization there were no signs any more that siting quality affected trend estimates, like Watts et al. (2015). Fall et al. did not make the baseless claim to have shown that homogenization does not work. If Watts et al. would draw about the same conclusions as Fall et al. no one would complain, but your press release made claims that are not supported.

I have the feeling this discussion is becoming a heat sink and if you have the emotional need to call me "Doc. Con" then maybe this is a good moment to close this discussion. That kind of language may also provide the reader with an indication why you have trouble reasoning in a logical fashion. Maybe it is a good idea to take a longer WUWT break to allow youself to have a fresh less biased look at your data.

Evan Jones said...

Dear me, no, Victor, I meant Dr. Connolley, not you, and his treatment of me (and yours) has been open and honest, and you both have raised legitimate concerns..

Fall, Watts et al. showed that after homogenization there were no signs any more that siting quality affected trend estimates, like Watts et al. (2015). Fall et al. did not make the baseless claim to have shown that homogenization does not work. If Watts et al. would draw about the same conclusions as Fall et al. no one would complain, but your press release made claims that are not supported.

In Fall et al, using a non metadata-vetted set and Leroy (1999), showed that well and poorly sited stations had roughly the same trend using raw data, as well as for homogenized data. So no issue at that point.

In W-15 (16?), using Leroy (2010), the problem arises that well and poorly sited stations showed a strong divergence, while homogonization causes the well sited station trends to be equal to the poorly sited stations, not the other way around.

That is the issue here.

MJX said...

When you get all your data in order and we can then determine the complete picture of your research, perhaps then it may be a proper time to continue this discussion.
After all, we need not tally the comments in the thousands, do we now?
Dr. Venema, you are correct my post was most unfriendly and to be honest, did not expect it to be posted. One comment I should point out that was incorrect was Evan declaring he learned from me. I am 100% certain he did not grasp one concept I put forth.
I agree to your comment
That kind of language may also provide the reader with an indication why you have trouble reasoning in a logical fashion. Maybe it is a good idea to take a longer WUWT break to allow youself to have a fresh less biased look at your data.
...and believe me Evan is biased Lukewarmer.
Take care and Dr Venema you need not post this at all or comment

PaulS said...

Evan,

Shorter periods than that are subject to too much Variable Variability, as it were, and are therefore suspect and all too often subject to the perils of the cherrypick.

But it's already been pointed out that the divergence does not occur with much variability.

If it is not microsite, then it is something that happens to coincide with microsite past the point of significance.

Based on the details of your classification I would suggest that one thing you've done is graded stations in terms of absolute temperature relative to the "true" local average. That is, Class 1 and 2 stations probably have relatively cool absolute readings, Classes 3,4,5 relatively warm absolute readings.

The consequence is that relocations which occurred on the timeline of Class 1 or 2 stations likely involved moving from warmer to cooler conditions. This includes MMTS conversions, which in practice often seem to have involved relocation. If left without appropriate homogenisation, trends will tend to bias low. For Classes 3,4,5 moves are about as likely to be going warmer as cooler.

Evan Jones said...

You concerns are valid, but they have been addressed since the 2012 pre-release:

But it's already been pointed out that the divergence does not occur with much variability.

Neither do temperatures themselves. The effect widens over time. Like temperatures.
E.g., One would not argue that CO2 forcing stopped from 1998 to 2008 just because the data showed cooling.

VeeV (in Sou's blog) suggested that the divergence was likely a result of jumps and the graphs of the two sets would have may different wiggles. But they don't. They show the same shape with a divergence over time to 1998 as the poorly sited stations warm faster, and a reconvergence from 1999 - 2008 as the poorly sited stations cool faster.

Based on the details of your classification I would suggest that one thing you've done is graded stations in terms of absolute temperature relative to the "true" local average.

A station's rating is based entirely on its physical surrounding with no reference whatever to its data. I use anomalies only and I anomalize each station to its own readings, not to those of other stations. Very generic, very self-contained.

A station's rating is based entirely on its physical surrounding with no reference whatever to its data.

That is, Class 1 and 2 stations probably have relatively cool absolute readings, Classes 3,4,5 relatively warm absolute readings.

Actually, I don't even know. (I agree that Class 1\2 probably have lower absolute temps than Class 3\4\5, but I have not measured that yet.)

A warmer-absolute station with a low trend is treated the same as a cooler-absolute station. So offset does not show up whatever, only trend difference.

The consequence is that relocations which occurred on the timeline of Class 1 or 2 stations likely involved moving from warmer to cooler conditions.

All stations that have moved from an unknown location to a known location are dropped. Any stations that has a localized move from a known location to another known location is dropped if the rating changed as a result of the move.

This includes MMTS conversions, which in practice often seem to have involved relocation.

If there was a move from an unknown location (the vast majority of cases), the station is dropped. Therefore any MMTS conversions as a result are moot, as those stations are considered perturbed and have no impact on our unperturbed dataset.

I'll be dealing with partial records in followup (as that requires like-rated pairwise to establish the startpoint) in order to increase regional coverage. I would not be surprised if, as a result, Region 9 (West) will have a higher Class 1\2 trend and region 6 (South) will have a lower Class 1\2 trend. To be seen.

If left without appropriate homogenisation, trends will tend to bias low. For Classes 3,4,5 moves are about as likely to be going warmer as cooler.

Only if the rating at the end of the study period changed (or is unknown) at some earlier earlier point of the study period(in which case, the station is dropped). Not if it was the same class throghout the study period.

Moves do not appear to have had much effect on our 2012 results. TOBS was the biggie. Dropping the stations with significant TOBS changes raised the trends of both Class 1\2 and 3\4\5 stations. That brought them closer to the homogenized data than our 2012 results, but the divergence between well and poorly sites stations was essentially unaffected (even increased a little) since 2012.

Evan Jones said...

When you get all your data in order and we can then determine the complete picture of your research, perhaps then it may be a proper time to continue this discussion.

Yes. Indeed.

One comment I should point out that was incorrect was Evan declaring he learned from me. I am 100% certain he did not grasp one concept I put forth.

You would be surprised.

I agree to your comment
That kind of language may also provide the reader with an indication why you have trouble reasoning in a logical fashion. Maybe it is a good idea to take a longer WUWT break to allow yourself to have a fresh less biased look at your data.


The problem here is that when he posted that, VeeV thought I was accusing him of being a con-man. But I wasn't. And I don't think he is or has been in any way. I was referring to Dr. Connolly (and do not consider him to be a con-man, either).

...and believe me Evan is biased Lukewarmer.

Yes. We all have our biases. All of us. It requires constant vigilance to avoid bias when conducting research.

The agents of Screwtape sit on our shoulder and whisper temptations in our ears. What methods to pursue (Anomalize? Grid? How?). Where to look. When to stop looking. After all, there are only so many hours in a lifetime. Our weapon against all this is scientific method and the stark knowledge that our work will be reviewed by those who do no agree with us.

Victor Venema said...

I thought it was obvious, but maybe it needs to be said: It is no better to call Dr. Connolley "doc con", just because the pun fits better, you still call him a con.

Evan Jones said...

I thought it was obvious, but maybe it needs to be said: It is no better to call Dr. Connolley "doc con", just because the pun fits better, you still call him a con.

Therefore I will not do so again. I wasn't thinking in those terms. Really. For the record, I do not think he is a con, con-man, or any other con-notation thereof. He has been an opponent of the hypothesis we are fitting forward, but he has been entirely straight with me.

Like yours, his suggestions and criticisms have proven valuable to me and to the team. We appreciate them and are grateful for them. As a result, we have come a long way since 2012 (and indeed our results are somewhat less strong than they once were).

Kevin O'Neill said...

Evan - I think you totally misunderstand the satellite temperature series. I realize Dr Christy is a coauthor, but that may just skew your perceptions.

Kevin Cowtan has a new article up: Surface Temperatures or Satellite Brightness. The article was reviewed for Kevin by Dr Carl Mears of RSS. You should read it.

You might also wish to read Nick Stokes' current post at Moyhu: Satellite temperature readings diverge from surface and each other

Sou at HotWhopper and Tamino at OpenMind have both recently addressed some of the issues as well.

Your apparent absolute faith in these sets might be misplaced - coauthor Christy notwithstanding.

Evan Jones said...

Okay, I read both of them, and there was nothing new in them I could determine. No gold speck I had missed. The flow chart was very interesting. But I'm not sure I get your point.

There have been major corrections to sat data since its inception. We know that and we know why. We now have two separate metrics, one run by, one by skeptics, one by those more on the activist side.

FWIW, I find RSS is the more historically steady of the two (Dr. Christy's being a coauthor of mine notwithstanding). UAH initially failed to correct for drift, resulting in cooling when it should have shown warming. UAH the corrected for drift to higher than RSS and eventually, after much consideration, recorrected, bringing it closely in line with RSS.

But now the two sat metrics track each other very similarly, even not using the same dataset (the major surface metrics use essentially the same stations). The sondes, also using a different set of data, track the sats at 98% confidence.

Surface metrics use the essentially same dataset and same methods (GISS, NOAA, and Haddy all having appended the K-15 approach to SST). No surprise they track each other fairly well.

We also know sats do not measure surface directly, but atmosphere. However, LT measurements are directly associated with surface (Klotzbach et al.) and found to have a trend ~10% higher over surface.

That flatly contradicts the surface metrics. However, when examining CONUS, we find that one the microsite bias is removed from the surface metrics, surface falls right into line with both sats and sondes -- within 10%, just as K09 indicates it should.

Comments on internal and external uncertainty are not strangers to me. I have noted them often. I found the part regarding error bars to be most interesting.

Yes, sats have wider external error bars than surface. I have also noticed that homogenized data also has far lower error bars than non-homogenized surface data. That show up in -- all -- my calculations. That's what happens when you take all your "outliers" and adjust them to the majority -- obviously.

The problem here is that, in broad terms, the outlying minority is well sited and the mainstream majority is poorly sited. Homogenization does not average, but identifies what it considers to be a minority outlier and adjusts it to conform with the majority group (doing so after TOBS adjustment, note).

So an internal error is increased while the external bars are decreased. Just the sort of problem Cowtan is bringing up, but the conclusions are quite different. Cowtan fails to say why the surface error bars are so small, but I know, having seen the effect close up and in my face.

That is both the strength and weakness of homogenization. If the majority sample is good, then your bad outlers are brought into line, thus improving accuracy and reducing the external error bars. But if the minority of sample is good and the majority is bad (viz microsite), then homogenization brings the good minority into line with the bad majority, also reducing external error bars -- all the while suffering from a systematic structural error.

When a systematic error of this nature is uncovered, it has to be addressed by the homog algorithm. Currently it is not. I suggest it should be. Only once it is will homogenization will perform as intended. Until then, all it does is cement a systematic error in place and yield even worse overall results than non-homogenized, straight-average data.

Kevin O'Neill said...

Evan - "surface falls right into line with both sats and sondes -- within 10%, just as K09 indicates it should."

Umm, what data are you using to claim satellites fall right into line with radiosondes? In the comments at Nick Stokes Olof posted a comparison of RATPAC vs UAH graph. The difference is significant.

Tamino has shown the same comparison and changepoint analysis indicates *something* happened with the satellites circa 2000. Which happens to be coincident with the changeover from MSU to AMSU.

Evan Jones said...

Roy indicates that sonde and sat data tracks pretty well using a RAOBCORE/RATPAC average.

Specifically, we see from Fig. 7 that application of the old and new LT weighting functions to the radiosonde trend profiles (average of the RAOBCORE and RATPAC trend profiles, 1979-2014) leads to almost identical trends (+0.11 C/decade) between the new and old LT. These trends are a good match to our new satellite-based LT trend, +0.114 C/decade.

http://www.drroyspencer.com/2015/04/version-6-0-of-the-uah-temperature-dataset-released-new-lt-trend-0-11-cdecade/

Evan Jones said...

The factors I am considering are (roughly):

Class 3\4\5 > Class 1\2
Homogenized > Class 1\2
RSS/UAH = Class 1\2

Therefore RSS/UAH is more likely to be correct than Homogenized.

I have already discussed the fact that and the reasons why Homogenization adjusts 1\2 to the level of 3\4\5.

Evan Jones said...

Veev, here is what I am saying regarding your initial (and, I think, primary) point.

RU = Raw, Unperturbed
H = Homogenized
1\2 - Well Sited
3\4\5 = Poorly sited

In a nutshell:

H-1\2 = H-3\4\5
H-All = RU-3\4\5
RU-1\2 < RU-3\4\5

Therefore RU-1\2 is more likely to be correct than H-1\2.

You are concentrating on the first step, but are not addressing the second and third points. I would be interested in your direct comments on that.

SUMMARY OF ISSUES:

Support:
-- Results are statistically significant.
-- All 9 Regions show at least some more warming for RU-3\4\5 than for RU-1\2. The odds of that happening randomly are 512-1, which result is, in and of itself, statistically significant.
-- Both warming and cooling trend exaggeration is identified.
-- Adequate station density. Our regional "gridding" gives RU-1\2 a higher result by ~0.03C/decade than non-gridded, so our results are not an artifact of gridding.
-- Metadata is very good (greatly improved since the inception of HOMR).
-- Data is anomalized. Non-anomalized results give us results even a little lower than Watts pre-release (2012). (Like I say, in many ways I have pushed hard against our hypothesis. Please believe me when I say I am really, truly trying to get this right.)
-- RSS/UAH = RU-1\2
-- Radiosondes = Class 1\2 But see above disputes of that by Kevin. Even if he is correct, our RU-1\2 results match radiosonde far better than surface.

Challenges:
-- Lack of Heat Sink physics equations. Descriptions, crude examples, only (at this point).
-- Metadata is only "very good". (Missing factors arguably more likely to result in net warming than net cooling.)
-- Low duration of cooling example (10 yrs.).
-- Leroy, even the 2010 version is a bit of a meataxe. (Albeit a good one.)
-- A few ratings are close calls and will likely be disputed.
-- Less than perfect Google Earth.
-- Not considering other factors re. Leroy (shade, vegetation, slope).
-- TOBS is not 100% removed.
-- MMTS adjustments. Both amount and character. Determining the correct baseline for both an upward trend (sic) adjustment of MMTS Tmax and a downward trend adjustment of CRS Tmax. (Some Tmin, but not much.)

I'm sure I left out a few, but I have you-all to remind me.

Followup issues:
-- Bringing in a physicist.
-- Inclusion of partial data records (pairwising for startpoint baselines), thus improving coverage.
-- ASOS fiddles. (Menne, 2010, mentions that they make no adjustment to ASOS, but there are some Max/Min adjustments that could be applied. HO-83 dewpoint issues, etc.)
-- Refinement and quantification of Leroy (2010).
-- Looking at other factors (Altitude, prevailing wind, terrain type, representational coverage, other Leroy considerations, etc.).

PaulS said...

All stations that have moved from an unknown location to a known location are dropped. Any stations that has a localized move from a known location to another known location is dropped if the rating changed as a result of the move.

I'm talking about changes which you haven't accounted for in the stations you've kept. Obviously for stations which are genuinely unperturbed this isn't an issue.

surface falls right into line with both sats and sondes -- within 10%, just as K09 indicates it should.

Expected amplification over mid-latitude continental areas is less than 1, so CONUS satellite TLT trends should be lower than surface. This effect is observable in inter-annual variations, which are notably larger for surface records compared to TLT.

MartinM said...

-- ASOS fiddles. (Menne, 2010, mentions that they make no adjustment to ASOS, but there are some Max/Min adjustments that could be applied. HO-83 dewpoint issues, etc.)

Uh, what? I can't find any such statement in Menne et al. 2010, and it flatly contradicts Menne et al. 2009, which gives values of -0.44 °C and -0.45 °C for the effect of transition in max and min temperatures respectively. If you have uncorrected ASOS stations in your dataset, you've got a huge systematic cooling bias lurking in there somewhere.

Victor Venema said...

Evan Jones: "You are concentrating on the first step, but are not addressing the second and third points. I would be interested in your direct comments on that."

If you mean with your second and third point your claim that homogenization does not work purely based on some trends being of similar size then I think I did respond to it. It is not even wrong.

Homogenization does not adjust the trend of the minority to the one of the majority. If you have a network with a trend of 1°C per century and make a change that drops the temperature by 2°C. All stations (a clear majority) will show warming. After homogenization these jumps will have been removed and all stations will show warming, they did not adjust to the cooling of the majority.

The only scenario where this would go wrong, as I explained on WUWT before, would be if the majority of stations would have a gradual inhomogeneity, rather than jumps. This gradual inhomogeneity would need to have the same period and size for all stations involved, otherwise we would still see in the difference time series that something is wrong. Gradual inhomogeneities exist, especially for urbanization stations due to increases in urbanization, but to claim that a majority of stations is affected in exactly the same magnitude, that is an extraordinary claim, that is in no way supported by your evidence. In fact I do not see you being able to provide evidence for such a claim with your study set-up. That could be studied by parallel measurements and would most likely have been reported by scientists studying parallel measurements; stations are often relocated because the original location is no longer well-sited.

Your list of problems to be fixed seems to mention most of the problems mentioned in the comments.

Evan Jones said...

The only scenario where this would go wrong, as I explained on WUWT before, would be if the majority of stations would have a gradual inhomogeneity, rather than jumps.

That is what is happening. The poorly sited majority is having a gradual inhomogeneity over time. That is what is going wrong. That is what needs to be fixed. Then homogenization will perform as intended.

I will answer the ASOS and sat. questions above, tomorrow.

I am also proposing to test our hypothesis regarding UHSCN and USCRN comparison (2005-2014), which I have not run yet:


Since the majority of USHCN is poorly sited and th CRN is well sited, if the Watts hypotheses regarding siting and the effect of anomalization are valid, the following results should occur:

1.) Regardless of whether or not there is any trend, the differences between summer and warmer temperatures should be greater for HCN than for CRN (2015-2014). This should not only occur a an overall result, but also in a large majority of years. A positive result would suggest heat sink effect in operation on an annual/seasonal basis as well as for longterm trends. (It should work for both.)
2.) If there is any overall trend from 2005-2014, the TOBS-adjusted HCN trend should be larger than the CRN trend.
3.) If there is any overall trend from 2005-2014, HCN homogenized data should magnify that trend over HCN TOBS-adjusted data, same as it does for the unperturbed 1979-2008 set.

Those are my predictions. We will see if our hypothesis has predicted correctly when I have set up the spreadsheets and run the numbers. I will know the results tomorrow night and will share them here, regardless of whether they support or falsify the hypothesis. Should be of interest, I think.

«Oldest ‹Older   1 – 200 of 254   Newer› Newest»