Showing posts with label climate prediction. Show all posts
Showing posts with label climate prediction. Show all posts

Wednesday, August 3, 2016

Climate model ensembles of opportunity and tuning



Listen to grumpy old men.

As a young cloud researcher at a large conference, enthusiastic about almost any topic, I went to a town-hall meeting on using a large number of climate model runs to study how well we know what we know. Or as scientists call this: using a climate model ensemble to study confidence/uncertainty intervals.

Using ensembles was still quite new. Climate Prediction dot Net had just started asking citizens to run climate models on their Personal Computers (old big iPads) to get the computer power to create large ensembles. Studies using just one climate model run were still very common. The weather predictions on the evening television news were still based on one weather prediction model run; they still showed highs, lows and fronts on static "weather maps".

During the questions, a grumpy old men spoke up. He was far from enthusiastic about his new stuff. I see a Statler or Waldorf angrily swing his wooden walking stick in the air. He urged everyone, everyone to be very careful and not to equate the ensemble with a sample from a probability distribution. The experts dutifully swore they were fully aware of this.

They likely were and still are. But now everyone uses ensembles. Often using them as if they sample the probability distribution.

Before I wrote about the problems confusing model spread and uncertainty made in the now mostly dead "hiatus" debate. That debate remains important: after the hiatus debate is before the hiatus debate. The new hiatus is already 4 month old.* And there are so many datasets to select a "hiatus" from.


Fyfe et al. (2013) compared the temperature trend from the CMIP ensemble (grey histogram) to observations (red something) implicitly assuming that the model spread is the uncertainty. While the estimated trend is near the model spread, it is well within the uncertainty. The right panel is for a 20 year period: 1993–2012. The left panel starts in the cherry picked large El Nino year: 1998–2012.

This time I would like to explain better why the ensemble model spread is typically smaller than the confidence interval. These reasons suggest other questions where we need to pay attention: It is also important for comparing long-term historical model runs with observations and could affect some climate change impact studies. For long-term projections and decadal climate prediction it is likely less relevant.

Reasons why model spread is not uncertainty

One climate model run is just one realisation. Reality has the same problem. But you can run a model multiple times. If you change the model fields you begin with just a little bit, due to the chaotic nature of atmospheric and oceanic flows a second run will show a different realisation. The highs, lows and fronts will move differently, the ocean surface is consequently warmed and cooled at different times and places, internal modes such as El Nino will appear at different times. This chaotic behaviour is mainly found at the short time scales and is one reason for the spread of an ensemble. And it is one reason to expect that model spread is not uncertainty because models focus on getting the long term trend right and differ strongly when it comes to the internal variability.

But that is just reason one. The modules of a climate model that simulate specific physical processes have parameters that are based on measurements or more detailed models. We only know these parameters within some confidence interval. A normal climate model takes the best estimate of these parameters, but they could be anywhere within the confidence interval. To study how important these parameters are special "perturbed physics" ensembles are created where every model run has parameters that vary within the confidence interval.

Creating a such an ensemble is difficult. Depending on the reason for the uncertainty in the parameter, it could make sense to keep its value constant or to continually change it within its confidence interval and anything in between. It could make sense to keep the value constant over the entire Earth or to change it spatially and again anything in between. The parameter or how much it can fluctuate may dependent on the local weather or climate. It could be that when parameter X is high also parameter Y is high (or low); these dependencies should also be taken into account. Finally, also the distributions of the parameters needs to be realistic. Doing all of this for the large number of parameters in a climate model is a lot of work, typically only the most important ones are perturbed.

You can generate an ensemble that has too much spread by perturbing the parameters too strongly (and by making the perturbations too persistent). If you do it optimally, the ensemble would still show too little spread because not all physical processes are modelled because they are thought not to be important enough to justify the work and the computational resources. Part of this spread can be studied by making ensembles using many different models (multi-model ensemble), which are developed by different groups with different research questions and different ideas what is important.

That is where the title comes in: ensembles of opportunity. These are ensembles of existing model runs that were not created to be an ensemble. The most important example is the ensemble of the Coupled Models Intercomparison Project (CMIP). This group coordinates the creating of a set of climate model runs for similar scenarios, so that the results of these models can be compared with each other. This ensemble will automatically sample the chaotic flows and it is a multi-model ensemble, but it is not a perturbed physics ensemble; these model runs are model aiming at the best possible reproduction of what happened. For this reason alone the spread of the CMIP ensemble is expected to be too low.

The term "ensembles of opportunity" is another example the tendency of natural scientists to select neutral or generous terms to describe the work of colleagues. The term "makeshift ensemble" may be clearer.

Climate model tuning

The CMIP ensemble also has too little spread when it comes to the global mean temperature because the model are partially tuned to it. There is just an interesting readable article out on climate model tuning in BAMS**, which is intended for a general audience. Tuning has a large number of objectives, from getting the mean temperature right to the relationship between humidity and precipitation. There is also a section on tuning to the magnitude of warming the last century. It states about the historical runs:
The amplitude of the 20th century warming depends primarily on the magnitude of the radiative forcing, the climate sensitivity, as well as the efficiency of ocean heat uptake. ...

Some modeling groups claim not to tune their models against 20th century warming, however, even for model developers it is difficult to ensure that this is absolutely true in practice because of the complexity and historical dimension of model development. ...

There is a broad spectrum of methods to improve model match to 20th century warming, ranging from simply choosing to no longer modify the value of a sensitive parameter when a match is already good for a given model, or selecting physical parameterizations that improve the match, to explicitly tuning either forcing or feedback both of which are uncertain and depend critically on tunable parameters (Murphy et al. 2004; Golaz et al. 2013). Model selection could, for instance, consist of choosing to include or leave out new processes, such as aerosol cloud interactions, to help the model better match the historical warming, or choosing to work on or replace a parameterization that is suspected of causing a perceived unrealistically low or high forcing or climate sensitivity.
Due to tuning models that have a low climate sensitivity tend to have stronger forcings over the last century and model with a high climate sensitivity a weaker forcing. The forcing due to greenhouse gasses does not vary much, that part is easy. The forcings due to small particles in the air (aerosols) that like CO2 stem from the burning of fossil fuels and are quite uncertain and Kiehl (2007) showed that high sensitivity models tend to have more cooling due to aerosols. For a more nuanced updated story see Knutti et al. (2008) and Forster et al. (2013).


Kiehl (2007) found an inverse correlation between forcing and climate sensitivity. The main reason for the differences in forcing was the cooling by aerosols.
This "tuning" initially was not an explicit tuning of model parameters, but mostly because modellers keep working until the results look good. Look good compared to observations. Bjorn Stevens talks about this in an otherwise also recommendable Forecast episode.

Nowadays the tuning is often performed more formally and an important part of studying the climate models and understanding their uncertainties. The BAMS article proposes to collect information on tuning for the upcoming CMIP. In principle a good idea, but I do not think that that is enough. In a simple example of climate sensitivity and aerosol forcing, the groups with low sensitivity and forcing and the ones with high sensitivity and forcing are happy with their temperature trend and will report not to have tuned. But that choice also leads to too little ensemble spread, just like the groups that did need to tune. Tuning makes it complicated to interpret the ensemble, it is no problem for a specific model run.

Given that we know the temperature increase, it is impossible not to get a tuned result. Furthermore, I mention several additional reasons why the model spread is not the uncertainty above that complicate the interpretation of the ensemble in the same way. A solution could be to follow the work in ensemble weather prediction with perturbed-physics ensembles and to tune all models, but to tune them to cover the full range of uncertainties that we estimate from the observations. This should at least cover the the climate sensitivity and ocean heat uptake, but preferably also other climate characteristics that are important for climate impact and climate variability studies. Large modelling centres may be able to create such large ensembles by themselves, the others could coordinate their work in CMIP to make sure the full uncertainty range is covered.

Historical climate runs

Because the physics is not perturbed and especially due to the tuning, you would expect that the CMIP ensemble spread is too low for global mean temperature increase. That the CMIP ensemble average fits well to the observed temperature increase shows that with reasonable physical choices we can understand why the temperature increased. It shows that known processes are sufficient to explain it. That is fits so accurately, does not say much. I liked the title of an article from Reto Knutti (2008): "Why are climate models reproducing the observed global surface warming so well?" Which implies it all.

Much more interesting to study how good models are, are spatial patterns and other observations. New datasets are greeted with much enthusiasm by modellers because they allow for the best comparison and are more likely to show new problems that need fixing and lead to a better understanding. Also model results for the deep past are important tests, which models are not tuned for.


That the CMIP ensemble mean fits to the observations is no reason to expect that the observations are reliable


When the observations peak out of this too narrow CMIP ensemble spread that is to be expected. If you want to make a case that our understanding does not fit to the observations, you have to take the uncertainties into account, not the spread.

Similarly, that the CMIP ensemble mean fits to the observations is no reason to expect that the observations are reliable. Because of the overconfidence in the data quality also many scientists took the recent minimal deviations from the trend line too seriously. This finally stimulated more research into the accuracy of temperature trends, into inhomogeneities in the ERSST sea surface temperatures, into the effect of coverage and how we blend sea, land and ice temperatures together. There are some more improvements under way.

Compared to the global warming of about 1°C up to now, these recent and upcoming corrections are large. Many of the problem could have been found long ago. It is 2016. It is about time to study this. If funding is an issue we could maybe sacrifice some climate change impact studies for wine. Or for truffles. Or caviar. The quality of our data is the foundation of our science.

That the comparison of the CMIP ensemble average with the instrumental observation is so central to the public climate "debate" is rather ironic. Please take a walk in the forest. Look at all the different changes. The ones that go slower as well as the many that go faster than expected.

Maybe it is good to emphasise that for the attribution of climate change to human activities, the size of the historical temperature increase is not used. The attribution is made via correlations with the 3-dimensional spatial patterns between observations and models. By using the correlations (rather than root mean square errors), the magnitude of the change in either the models or the observations is no longer important. Ribes (2016) is working on using the magnitude of the changes as well. This is difficult because of inevitable tuning, which makes specifying the uncertainties very difficult.

Climate change impact studies

Studying the impacts of climate change is hard. Whether dikes break depends not only on sea level rise, but also on the changes in storms. The maintenance of the dikes and the tides are important. It matters whether you have a functioning government that also takes care of problems that only become apparent when the catastrophe happens. I would not sleep well if I lived in an area where civil servants are not allowed to talk about climate change. Because of the additional unnecessary climate dangers, but especially because that is a clear sign of a dysfunctional government that does not prioritise protecting its people.

The too narrow CMIP ensemble spread can lead to underestimates of the climate change impacts because typically the higher damages from stronger than expected changes are larger than the reduced damages from smaller changes. The uncertainty monster is not our friend. Admittedly, the effect of the uncertainties is rather modest. This is only important for those impacts we understand reasonably well already. The lack of variability can be partially solved in the statistical post-processing (bias correction and downscaling). This is not common yet, but Grenier et al. (2015) proposed a statistical method to make the natural variability more realistic.

This problem will hopefully soon be solved when the research programs on decadal climate prediction mature. The changes over a decade due to greenhouse warming are modest, for decadal prediction we thus especially need to accurately predict the natural variability of the climate system. An important part of these studies is assessing whether and which changes can be predicted. As a consequence there is a strong focus on situation specific uncertainties and statistical post-processing to correct biases of the model ensemble in the means and in the uncertainties.

In the tropics decadal climate prediction works reasonably well and helps farmers and governments in their planning.



In the mid-latitudes, where most of the researchers live, it is frustratingly difficult to make decadal predictions. Still even in that case, we would still have an ensemble where the ensemble can be used as a sample of the probability distribution. That is important progress.

When a lack of ensemble spread is a problem for historical runs, you might expect it to be a problem for projecting for the rest of the century. This is probably not the case. The problem of tuning would be much reduced because the influence of aerosols will be much smaller as the signal of greenhouse gasses becomes much more dominant. For long term projections the main factor is that the climate sensitivity of the models needs to fit to our understanding of the climate sensitivity from all studies. This fit is reasonable for the best estimate of the climate sensitivity, which we expect to be 3°C for a doubling of the CO2 concentration. I do not know how well the fit is for the spread in the climate sensitivity.

However, for long-term projections even the climate sensitivity is not that important. For the magnitude of the climatic changes in 2100 and for the impact of climate change in 2100, the main source of uncertainty is what we will do. As you can see in the figure below the difference between a business as usual scenario and strong climate policies is 3 °C (6 °F). The uncertainties within these scenario's is relatively small. Thus the main question is whether and how aggressively we will act to combat climate change.





Related information

Is it time to freak out about the climate sensitivity estimates from energy budget models?

Fans of Judith Curry: the uncertainty monster is not your friend

Are climate models running hot or observations running cold?

Forecast: Gavin Schmidt on the evolution, testing and discussion of climate models

Forecast: Bjorn Stevens on the philosophy of climate modeling

The Guardian: In a blind test, economists reject the notion of a global warming pause

Reference

Forster, P.M., T. Andrews, P. Good, J.M. Gregory, L.S. Jackson, and M. Zelinka, 2013: Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. Journal of Geophysical Research, 118, 1139–1150, doi: 10.1002/jgrd.50174.

Fyfe, John C., Nathan P. Gillett and Francis W. Zwiers, 2013: Overestimated global warming over the past 20 years. Nature Climate Change, 3, pp. 767–769, doi: 10.1038/nclimate1972.

Golaz, J.-C., J.-C. Golaz, and H. Levy, 2013: Cloud tuning in a coupled climate model: Impact on 20th century warming. Geophysical Research Letters, 40, pp. 2246–2251, doi: 10.1002/grl.50232.

Grenier, Patrick, Diane Chaumont and Ramón de Elía, 2015: Statistical adjustment of simulated inter-annual variability in an investigation of short-term temperature trend distributions over Canada. EGU general meeting, Vienna, Austria.

Hourdin, Frederic, Thorsten Mauritsen, Andrew Gettelman, Jean-Christophe Golaz, Venkatramani Balaji, Qingyun Duan, Doris Folini, Duoying Ji, Daniel Klocke, Yun Qian, Florian Rauser, Cathrine Rio, Lorenzo Tomassini, Masahiro Watanabe, and Daniel Williamson, 2016: The art and science of climate model tuning. Bulletin of the American Meteorological Society, published online, doi: 10.1175/BAMS-D-15-00135.1.

Kiehl, J.T., 2007: Twentieth century climate model response and climate sensitivity. Geophysical Research Letters, 34, L22710, doi: 10.1029/2007GL031383.

Knutti, R., 2008: Why are climate models reproducing the observed global surface warming so well? Geophysical Research Letters, 35, L18704, doi: 10.1029/2008GL034932.

Murphy, J.M., D.M.H. Sexton, D.N. Barnett, G.S. Jones, M.J. Webb, M. Collins and D.A. Stainforth, 2004: Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430, pp. 768–772, doi: 10.1038/nature02771.

Ribes, A., 2016: Multi-model detection and attribution without linear regression. 13th International Meeting on Statistical Climatology, Canmore, Canada. Abstract below.

Rowlands, Daniel J., David J. Frame, Duncan Ackerley, Tolu Aina, Ben B. B. Booth, Carl Christensen, Matthew Collins, Nicholas Faull, Chris E. Forest, Benjamin S. Grandey, Edward Gryspeerdt, Eleanor J. Highwood, William J. Ingram, Sylvia Knight, Ana Lopez, Neil Massey, Frances McNamara, Nicolai Meinshausen, Claudio Piani, Suzanne M. Rosier, Benjamin M. Sanderson, Leonard A. Smith, Dáithí A. Stone, Milo Thurston, Kuniko Yamazaki, Y. Hiro Yamazaki & Myles R. Allen, 2012: Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nature Geoscience, 5, pp. 256–260, doi: 10.1038/ngeo1430 (manuscript).


MULTI-MODEL DETECTION AND ATTRIBUTION WITHOUT LINEAR REGRESSION
Aurélien Ribes
Abstract. Conventional D&A statistical methods involve linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. As an alternative to this approach, we propose a new statistical model for detection and attribution based only on the additivity assumption. We introduce estimation and testing procedures based on likelihood maximization. As the possibility of misrepresented response magnitudes is removed in this revised statistical framework, it is important to take the climate modelling uncertainty into account. In this way, modelling uncertainty in the response magnitude and the response pattern is treated consistently. We show that climate modelling uncertainty can be accounted for easily in our approach. We then provide some discussion on how to practically estimate this source of uncertainty, and on the future challenges related to multi-model D&A in the framework of CMIP6/DAMIP.


* Because this is the internet, let me say that "The new hiatus is already 4 month old." is a joke.

** The BAMS article calls any way to estimate a parameter "tuning". I would personally only call it tuning if you optimize for emerging properties of the climate model. If you estimate a parameter based on observations or a specialized model, I would not call this tuning, but simply parameter estimation or parameterization development. Radiative transfer schemes use the assumption that adjacent layers of clouds are maximally overlapped and that if there is a clear layer between two cloud layers that they are random overlapped. You could introduce two parameters that vary between maximum and random for these two cases, but that is not done. You could call that an implicit parameter, which shows that distinguishing between parameter estimation and parameterization development is hard.

*** Photo at the top: Grumpy Tortoise Face by Eric Kilby, used under a Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.

Climate model ensembles of opportunity and tuning



Listen to grumpy old men.

As a young cloud researcher at a large conference, enthusiastic about almost any topic, I went to a town-hall meeting on using a large number of climate model runs to study how well we know what we know. Or as scientists call this: using a climate model ensemble to study confidence/uncertainty intervals.

Using ensembles was still quite new. Climate Prediction dot Net had just started asking citizens to run climate models on their Personal Computers (old big iPads) to get the computer power to create large ensembles. Studies using just one climate model run were still very common. The weather predictions on the evening television news were still based on one weather prediction model run; they still showed highs, lows and fronts on static "weather maps".

During the questions, a grumpy old men spoke up. He was far from enthusiastic about his new stuff. I see a Statler or Waldorf angrily swing his wooden walking stick in the air. He urged everyone, everyone to be very careful and not to equate the ensemble with a sample from a probability distribution. The experts dutifully swore they were fully aware of this.

They likely were and still are. But now everyone uses ensembles. Often using them as if they sample the probability distribution.

Before I wrote about the problems confusing model spread and uncertainty made in the now mostly dead "hiatus" debate. That debate remains important: after the hiatus debate is before the hiatus debate. The new hiatus is already 4 month old.* And there are so many datasets to select a "hiatus" from.


Fyfe et al. (2013) compared the temperature trend from the CMIP ensemble (grey histogram) to observations (red something) implicitly assuming that the model spread is the uncertainty. While the estimated trend is near the model spread, it is well within the uncertainty. The right panel is for a 20 year period: 1993–2012. The left panel starts in the cherry picked large El Nino year: 1998–2012.

This time I would like to explain better why the ensemble model spread is typically smaller than the confidence interval. These reasons suggest other questions where we need to pay attention: It is also important for comparing long-term historical model runs with observations and could affect some climate change impact studies. For long-term projections and decadal climate prediction it is likely less relevant.

Reasons why model spread is not uncertainty

One climate model run is just one realisation. Reality has the same problem. But you can run a model multiple times. If you change the model fields you begin with just a little bit, due to the chaotic nature of atmospheric and oceanic flows a second run will show a different realisation. The highs, lows and fronts will move differently, the ocean surface is consequently warmed and cooled at different times and places, internal modes such as El Nino will appear at different times. This chaotic behaviour is mainly found at the short time scales and is one reason for the spread of an ensemble. And it is one reason to expect that model spread is not uncertainty because models focus on getting the long term trend right and differ strongly when it comes to the internal variability.

But that is just reason one. The modules of a climate model that simulate specific physical processes have parameters that are based on measurements or more detailed models. We only know these parameters within some confidence interval. A normal climate model takes the best estimate of these parameters, but they could be anywhere within the confidence interval. To study how important these parameters are special "perturbed physics" ensembles are created where every model run has parameters that vary within the confidence interval.

Creating a such an ensemble is difficult. Depending on the reason for the uncertainty in the parameter, it could make sense to keep its value constant or to continually change it within its confidence interval and anything in between. It could make sense to keep the value constant over the entire Earth or to change it spatially and again anything in between. The parameter or how much it can fluctuate may dependent on the local weather or climate. It could be that parameter X is high also parameter Y is high (or low); these dependencies should also be taken into account. Finally, also the distributions of the parameters needs to be realistic. Doing all of this for the large number of parameters in a climate model is a lot of work, typically only the most important ones are perturbed.

You can generate an ensemble that has too much spread by perturbing the parameters too strongly (and by making the perturbations too persistent). If you do it optimally, the ensemble would still show too little spread because not all physical processes are modelled because they are thought not to be important enough to justify the work and the computational resources. Part of this spread can be studied by making ensembles using many different models (multi-model ensemble), which are developed by different groups with different research questions and different ideas what is important.

That is where the title comes in: ensembles of opportunity. These are ensembles of existing model runs that were not created to be an ensemble. The most important example is the ensemble of the Coupled Models Intercomparison Project (CMIP). This group coordinates the creating of a set of climate model runs for similar scenarios, so that the results of these models can be compared with each other. This ensemble will automatically sample the chaotic flows and it is a multi-model ensemble, but it is not a perturbed physics ensemble; these model runs are model aiming at the best possible reproduction of what happened. For this reason alone the spread of the CMIP ensemble is expected to be too low.

The term "ensembles of opportunity" is another example the tendency of natural scientists to select neutral or generous terms to describe the work of colleagues. The term "makeshift ensemble" may be clearer.

Climate model tuning

The CMIP ensemble also has too little spread when it comes to the global mean temperature because the model are partially tuned to it. There is just an interesting readable article out on climate model tuning in BAMS**, which is intended for a general audience. Tuning has a large number of objectives, from getting the mean temperature right to the relationship between humidity and precipitation. There is also a section on tuning to the magnitude of warming the last century. It states about the historical runs:
The amplitude of the 20th century warming depends primarily on the magnitude of the radiative forcing, the climate sensitivity, as well as the efficiency of ocean heat uptake. ...

Some modeling groups claim not to tune their models against 20th century warming, however, even for model developers it is difficult to ensure that this is absolutely true in practice because of the complexity and historical dimension of model development. ...

There is a broad spectrum of methods to improve model match to 20th century warming, ranging from simply choosing to no longer modify the value of a sensitive parameter when a match is already good for a given model, or selecting physical parameterizations that improve the match, to explicitly tuning either forcing or feedback both of which are uncertain and depend critically on tunable parameters (Murphy et al. 2004; Golaz et al. 2013). Model selection could, for instance, consist of choosing to include or leave out new processes, such as aerosol cloud interactions, to help the model better match the historical warming, or choosing to work on or replace a parameterization that is suspected of causing a perceived unrealistically low or high forcing or climate sensitivity.
Due to tuning models that have a low climate sensitivity tend to have stronger forcings over the last century and model with a high climate sensitivity a weaker forcing. The forcing due to greenhouse gasses does not vary much, that part is easy. The forcings due to small particles in the air (aerosols) that like CO2 stem from the burning of fossil fuels and are quite uncertain and Kiehl (2007) showed that high sensitivity models tend to have more cooling due to aerosols. For a more nuanced updated story see Knutti et al. (2008) and Forster et al. (2013).


Kiehl (2007) found an inverse correlation between forcing and climate sensitivity. The main reason for the differences in forcing was the cooling by aerosols.
This "tuning" initially was not an explicit tuning of model parameters, but mostly because modellers keep working until the results look good. Look good compared to observations. Bjorn Stevens talks about this in an otherwise also recommendable Forecast episode.

Nowadays the tuning is often performed more formally and an important part of studying the climate models and understanding their uncertainties. The BAMS article proposes to collect information on tuning for the upcoming CMIP. In principle a good idea, but I do not think that that is enough. In a simple example of climate sensitivity and aerosol forcing, the groups with low sensitivity and forcing and the ones with high sensitivity and forcing are happy with their temperature trend and will report not to have tuned. But that choice also leads to too little ensemble spread, just like the groups that did need to tune. Tuning makes it complicated to interpret the ensemble, it is no problem for a specific model run.

Given that we know the temperature increase, it is impossible not to get a tuned result. Furthermore, I mention several additional reasons why the model spread is not the uncertainty above that complicate the interpretation of the ensemble in the same way. A solution could be to follow the work in ensemble weather prediction with perturbed-physics ensembles and to tune all models, but to tune them to cover the full range of uncertainties that we estimate from the observations. This should at least cover the the climate sensitivity and ocean heat uptake, but preferably also other climate characteristics that are important for climate impact and climate variability studies. Large modelling centres may be able to create such large ensembles by themselves, the others could coordinate their work in CMIP to make sure the full uncertainty range is covered.

Historical climate runs

Because the physics is not perturbed and especially due to the tuning, you would expect that the CMIP ensemble spread is too low for global mean temperature increase. That the CMIP ensemble average fits well to the observed temperature increase shows that with reasonable physical choices we can understand why the temperature increased. It shows that known processes are sufficient to explain it. That is fits so accurately, does not say much. I liked the title of an article from Reto Knutti (2008): "Why are climate models reproducing the observed global surface warming so well?" Which implies it all.

Much more interesting to study how good models are, are spatial patterns and other observations. New datasets are greeted with much enthusiasm by modellers because they allow for the best comparison and are more likely to show new problems that need fixing and lead to a better understanding. Also model results for the deep past are important tests, which models are not tuned for.


That the CMIP ensemble mean fits to the observations is no reason to expect that the observations are reliable


When the observations peak out of this too narrow CMIP ensemble spread that is to be expected. If you want to make a case that our understanding does not fit to the observations, you have to take the uncertainties into account, not the spread.

Similarly, that the CMIP ensemble mean fits to the observations is no reason to expect that the observations are reliable. Because of the overconfidence in the data quality also many scientists took the recent minimal deviations from the trend line too seriously. This finally stimulated more research into the accuracy of temperature trends, into inhomogeneities in the ERSST sea surface temperatures, into the effect of coverage and how we blend sea, land and ice temperatures together. There are some more improvements under way.

Compared to the global warming of about 1°C up to now, these recent and upcoming corrections are large. Many of the problem could have been found long ago. It is 2016. It is about time to study this. If funding is an issue we could maybe sacrifice some climate change impact studies for wine. Or for truffles. Or caviar. The quality of our data is the foundation of our science.

That the comparison of the CMIP ensemble average with the instrumental observation is so central to the public climate "debate" is rather ironic. Please take a walk in the forest. Look at all the different changes. The ones that go slower as well as the many that go faster than expected.

Maybe it is good to emphasise that for the attribution of climate change to human activities, the size of the historical temperature increase is not used. The attribution is made via correlations with the 3-dimensional spatial patterns between observations and models. By using the correlations (rather than root mean square errors), the magnitude of the change in either the models or the observations is no longer important. Ribes (2016) is working on using the magnitude of the changes as well. This is difficult because of inevitable tuning, which makes specifying the uncertainties very difficult.

Climate change impact studies

Studying the impacts of climate change is hard. Whether dikes break depends not only on sea level rise, but also on the changes in storms. The maintenance of the dikes and the tides are important. It matters whether you have a functioning government that also takes care of problems that only become apparent when the catastrophe happens. I would not sleep well if I lived in an area where civil servants are not allowed to talk about climate change. Because of the additional unnecessary climate dangers, but especially because that is a clear sign of a dysfunctional government that does not prioritise protecting its people.

The too narrow CMIP ensemble spread can lead to underestimates of the climate change impacts because typically the higher damages from stronger than expected changes are larger than the reduced damages from smaller changes. The uncertainty monster is not our friend. Admittedly, the effect of the uncertainties is rather modest. This this is only important for those impacts we understand reasonably well already. The lack of variability can be partially solved in the statistical post-processing (bias correction and downscaling). This is not common yet, but Grenier et al. (2015) proposed a statistical method to make the natural variability more realistic.

This problem will hopefully soon be solved when the research programs on decadal climate prediction mature. The changes over a decade due to greenhouse warming are modest, for decadal prediction we thus especially need to accurately predict the natural variability of the climate system. An important part of these studies is assessing whether and which changes can be predicted. As a consequence there is a strong focus on situation specific uncertainties and statistical post-processing to correct biases of the model ensemble in the means and in the uncertainties.

In the tropics decadal climate prediction works reasonably well and helps farmers and governments in their planning.



In the mid-latitudes, where most of the researchers live, it is frustratingly difficult to make decadal predictions. Still even in that case, we would still have an ensemble where the ensemble can be used as a sample of the probability distribution. That is important progress.

When a lack of ensemble spread is a problem for historical runs, you might expect it to be a problem for projecting for the rest of the century. This is probably not the case. The problem of tuning would be much reduced because the influence of aerosols will be much smaller as the signal of greenhouse gasses becomes much more dominant. For long term projections the main factor is that the climate sensitivity of the models needs to fit to our understanding of the climate sensitivity from all studies. This fit is reasonable for the best estimate of the climate sensitivity, which we expect to be 3°C for a doubling of the CO2 concentration. I do not know how well the fit is for the spread in the climate sensitivity.

However, for long-term projections even the climate sensitivity is not that important. For the magnitude of the climatic changes in 2100 and for the impact of climate change in 2100, the main source of uncertainty is what we will do. As you can see in the figure below the difference between a business as usual scenario and strong climate policies is 3 °C (6 °F). The uncertainties within these scenario's is relatively small. Thus the main question is whether and how aggressively we will act to combat climate change.





Related information

New article (september 2017): Gavin Schmidt et al.: Practice and philosophy of climate model tuning across six US modeling centers.

Discussion paper suggesting a path to solving the difference between model spread and uncertainty by James Annan and Julia Hargreaves: On the meaning of independence in climate science.

Is it time to freak out about the climate sensitivity estimates from energy budget models?

Fans of Judith Curry: the uncertainty monster is not your friend.

Are climate models running hot or observations running cold?

Forecast: Gavin Schmidt on the evolution, testing and discussion of climate models.

Forecast: Bjorn Stevens on the philosophy of climate modeling.

The Guardian: In a blind test, economists reject the notion of a global warming pause.

Reference

Forster, P.M., T. Andrews, P. Good, J.M. Gregory, L.S. Jackson, and M. Zelinka, 2013: Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. Journal of Geophysical Research, 118, 1139–1150, doi: 10.1002/jgrd.50174.

Fyfe, John C., Nathan P. Gillett and Francis W. Zwiers, 2013: Overestimated global warming over the past 20 years. Nature Climate Change, 3, pp. 767–769, doi: 10.1038/nclimate1972.

Golaz, J.-C., J.-C. Golaz, and H. Levy, 2013: Cloud tuning in a coupled climate model: Impact on 20th century warming. Geophysical Research Letters, 40, pp. 2246–2251, doi: 10.1002/grl.50232.

Grenier, Patrick, Diane Chaumont and Ramón de Elía, 2015: Statistical adjustment of simulated inter-annual variability in an investigation of short-term temperature trend distributions over Canada. EGU general meeting, Vienna, Austria.

Hourdin, Frederic, Thorsten Mauritsen, Andrew Gettelman, Jean-Christophe Golaz, Venkatramani Balaji, Qingyun Duan, Doris Folini, Duoying Ji, Daniel Klocke, Yun Qian, Florian Rauser, Cathrine Rio, Lorenzo Tomassini, Masahiro Watanabe, and Daniel Williamson, 2016: The art and science of climate model tuning. Bulletin of the American Meteorological Society, published online, doi: 10.1175/BAMS-D-15-00135.1.

Kiehl, J.T., 2007: Twentieth century climate model response and climate sensitivity. Geophysical Research Letters, 34, L22710, doi: 10.1029/2007GL031383.

Knutti, R., 2008: Why are climate models reproducing the observed global surface warming so well? Geophysical Research Letters, 35, L18704, doi: 10.1029/2008GL034932.

Murphy, J.M., D.M.H. Sexton, D.N. Barnett, G.S. Jones, M.J. Webb, M. Collins and D.A. Stainforth, 2004: Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430, pp. 768–772, doi: 10.1038/nature02771.

Ribes, A., 2016: Multi-model detection and attribution without linear regression. 13th International Meeting on Statistical Climatology, Canmore, Canada. Abstract below.

Rowlands, Daniel J., David J. Frame, Duncan Ackerley, Tolu Aina, Ben B. B. Booth, Carl Christensen, Matthew Collins, Nicholas Faull, Chris E. Forest, Benjamin S. Grandey, Edward Gryspeerdt, Eleanor J. Highwood, William J. Ingram, Sylvia Knight, Ana Lopez, Neil Massey, Frances McNamara, Nicolai Meinshausen, Claudio Piani, Suzanne M. Rosier, Benjamin M. Sanderson, Leonard A. Smith, Dáithí A. Stone, Milo Thurston, Kuniko Yamazaki, Y. Hiro Yamazaki & Myles R. Allen, 2012: Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nature Geoscience, 5, pp. 256–260, doi: 10.1038/ngeo1430 (manuscript).


MULTI-MODEL DETECTION AND ATTRIBUTION WITHOUT LINEAR REGRESSION
Aurélien Ribes
Abstract. Conventional D&A statistical methods involve linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. As an alternative to this approach, we propose a new statistical model for detection and attribution based only on the additivity assumption. We introduce estimation and testing procedures based on likelihood maximization. As the possibility of misrepresented response magnitudes is removed in this revised statistical framework, it is important to take the climate modelling uncertainty into account. In this way, modelling uncertainty in the response magnitude and the response pattern is treated consistently. We show that climate modelling uncertainty can be accounted for easily in our approach. We then provide some discussion on how to practically estimate this source of uncertainty, and on the future challenges related to multi-model D&A in the framework of CMIP6/DAMIP.


* Because this is the internet, let me say that "The new hiatus is already 4 month old." is a joke.

** The BAMS article calls any way to estimate a parameter "tuning". I would personally only call it tuning if you optimize for emerging properties of the climate model. If you estimate a parameter based on observations or a specialized model, I would not call this tuning, but simply parameter estimation or parameterization development. Radiative transfer schemes use the assumption that adjacent layers of clouds are maximally overlapped and that if there is a clear layer between two cloud layers that they are random overlapped. You could introduce two parameters that vary between maximum and random for these two cases, but that is not done. You could call that an implicit parameter, which shows that distinguishing between parameter estimation and parameterization development is hard.

*** Photo at the top: Grumpy Tortoise Face by Eric Kilby, used under a Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.

Thursday, September 24, 2015

Model spread is not uncertainty #NWP #ClimatePrediction

Comparison of a large set of climate model runs (CMIP5) with several observational temperature estimates. The thick black line is the mean of all model runs. The grey region is its model spread. The dotted lines show the model mean and spread with new estimates of the climate forcings. The coloured lines are 5 different estimates of the global mean annual temperature from weather stations and sea surface temperature observations. Figures: Gavin Schmidt.

It seems as if 2015 and likely also 2016 will become very hot years. So hot that you no longer need statistics to see that there was no decrease in the rate of warming, you can easily see it by eye now. Maybe the graph also looks less deceptive now that the very prominent super El Nino year 1998 is clearly no longer the hottest.

The "debate" is therefore now shifting to the claim that "the models are running hot". This claim ignores the other main option: that the observations are running cold. Even assuming the observations to be perfect, it is not that relevant that some years the observed annual mean temperatures were close to lower edge of the spread of all the climate model runs (ensemble spread). See comparison shown at the top.

Now that we do not have this case for some years, it may be a neutral occasion to explain that the spread of all the climate model runs does not equal the uncertainty of these model runs. Because also some scientists seem to make this mistake, I thought this was worthy of a post. One hint is naturally that the words are different. That is for a reason.

Long long ago at a debate at the scientific conference EGU there was an older scientist who was really upset by ClimatePrediction.net, where the public can give their computer resources to produce a very large dataset with many different climate model runs with a range of settings for parameters we are uncertain about. He worried that the modeled distribution would be used as a statistical probability distribution. He was assured that everyone was well aware the model spread was not the uncertainty. But it seems he was right and this awareness has faded.



Ensemble weather prediction

It is easiest to explain this difference in the framework of ensemble weather prediction, rather than going to climate directly. Much more work has been done in this field (meteorology is bigger and decadal climate prediction has just started). Furthermore, daily weather predictions offer much more data to study how good the prediction was and how good the ensemble spread fits to the uncertainty.

While it is popular to complain about weather predictions, they are quite good and continually improving. The prediction for 3 days ahead is now as good as the prediction for the next day when I was young. If people really thought the weather prediction was bad, you have to wonder why they pay attention to it. I guess, complaining about the weather and predictions is just a save conversation topic. Except when you stumble upon a meteorologist.

Part of the recent improvement of the weather predictions is that not just one, but a large number of predictions is computed, what scientists call: ensemble weather prediction. Not only is the mean of such an ensemble more accurate than just the single realization we used to have, the ensemble spread also gives you an idea of the uncertainty of the prediction.

Somewhere in the sunny middle of a large high-pressure system you can be quite confident that the prediction is right; errors in the position of the high are then not that important. If this is combined with a blocking situation, where the highs and lows do not move eastwards much, it may be possible to make very confident predictions many days in advance. If a front is approaching it becomes harder to tell well in advance whether it will pass your region or miss it. If the weather will be showery, it is very hard to tell where exactly the showers will hit.

Ensembles give information on how predictable the weather is, but they do not provide reliable quantitative information on the uncertainties. Typically the ensemble is overconfident, the ensemble spread is smaller than the real uncertainty. You can test this by comparing predictions with many observation. In the figure below you can read that if the raw model ensemble (black line) is 100% certain (forecast probability) that it will rain more than 1mm/hr, it should only have been 50% sure. Or when 50% of the model ensemble showed rain, the observations showed 30% of such cases.


The "reliability diagram" for an ensemble of the regional weather prediction system of the German weather service for the probability of more than 1 mm of rain per hour. On the x-axis is the probability of the model, on the y-axis the observed frequency. The thick black line is the raw model ensemble. Thus when all ensemble members (100% probability) showed more than 1mm/hr, it was only rain that hard half the time. The light lines show results two methods to reduce the overconfidence of the model ensemble. Figure 7a from Ben Bouallègue et al. (2013).
To generate this "raw" regional model ensemble, four different global models were used for the state of the weather at the borders of this regional weather prediction model, the initial conditions of the regional atmosphere were varied and different model configurations were used.

The raw ensemble is still overconfident because the initial conditions are given by the best estimate of the state of the atmosphere, which has less variability than the actual state. The atmospheric circulation varies on spatial scales of millimeters to the size of the planet. Weather prediction models cannot model this completely, the computers are not big enough, rather they compute the circulation using a large number of grid boxes with are typically 1 to 25 km in size. The flows on smaller scales do influence the larger scale flow, this influence is computed with a strongly simplified model for turbulence: so called parameterizations. These parameterization are based on measurements or more detailed models. Typically, they aim to predict the mean influence of the turbulence, but the small-scale flow is not always the same and would have varied if it would have been possible to compute it explicitly. This variability is missing.

The same goes for the parameterizations for clouds, their water content and cloud cover. The cloud cover is a function of the relative humidity. If you look at the data, this relationship is very noisy, but the parameterization only takes the best guess. The parameterization for solar radiation takes these clouds in the various model layers and makes assumptions how they overlap from layer to layer. In the model this is always the same; in reality it varies. The same goes for precipitation, for the influence of the vegetation, for the roughness of the surface and so on. Scientists have started working on developing parameterizations that also simulate the variations, but this field is still in its infancy.

Also the data for the boundary conditions (height and roughness of the vegetation), the brightness of the vegetation and soil, the ozone concentrations and the amount of dust particles in the air (aerosols) are normally taken to be constant.

For the raw data fetishists out there: Part of this improvement in weather predictions is due to the statistical post processing of the raw model output. From simple to complicated: it may be seen in the observations that a model is on average 1 degree too cold, it may be known that this is two degrees for a certain region, this may be due to biases especially during sunny high-pressure conditions. The statistical processing of weather predictions to reduce such known biases is known as model output statistics (MOS). (This is methodologically very similar to the homogenization of daily climate data.)

The same statistical post-processing for the average can also be used to correct the overconfidence of the model spread of the weather prediction ensembles. Again from the simple to the complicated. When the above model ensemble is 100% sure it will rain, this can be corrected to 50%. The next step is to make this correction dependent on the rain rate; when all ensemble members show strong precipitation, the probability of precipitation is larger than when most only show drizzle.

Climate projection and prediction

There is no reason whatsoever to think that the model spread of an ensemble of climate projections is an accurate estimate of the uncertainty. My inexpert opinion would be that for temperature the spread is likely again too small, I would guess up to a factor two. The better informed authors of the last IPCC report seems to agree with me when they write:
The CMIP3 and CMIP5 projections are ensembles of opportunity, and it is explicitly recognized that there are sources of uncertainty not simulated by the models. Evidence of this can be seen by comparing the Rowlands et al. (2012) projections for the A1B scenario, which were obtained using a very large ensemble in which the physics parameterizations were perturbed in a single climate model, with the corresponding raw multi-model CMIP3 projections. The former exhibit a substantially larger likely range than the latter. A pragmatic approach to addressing this issue, which was used in the AR4 and is also used in Chapter 12, is to consider the 5 to 95% CMIP3/5 range as a ‘likely’ rather than ‘very likely’ range.
The confidence interval of the "very likely" range is normally about twice as large as the "likely" range.

The ensemble of climate projections is intended to estimate the long-term changes in the climate. It was never intended to be used on the short term. Scientists have just started doing that under the header of "decadal climate prediction" and that is hard. That is hard because then we need to model the influence of internal variability of the climate system, variations in the oceans, ice cover, vegetation and hydrology. Many of these influences are local. Local and short term variation that are not important for long-term projections of global means thus need to be accurate for decadal predictions. The to be predicted variations in the global mean temperature are small; that we can do this at all is probably because regionally the variations are larger. Peru and Australia see a clear influence of El Nino, which makes it easier to study. While El Nino is the biggest climate mode, globally its effect is just a (few) tenth of a degree Celsius.

Another interesting climate mode is the [[Quasi Biannual Oscillation]] (QBO), an oscillation in the wind direction in the stratosphere. If you do not know it, no problem, that is one for the climate mode connoisseur. To model it with a global climate model, you need a model with a very high top (about 100 km) and many model layers in stratosphere. That takes a lot of computational resources and there is no indication that the QBO is important for long-term warming. Thus naturally most, if not all, global climate model projections ignore it.

Ed Hawkins has a post showing the internal variability of a large number of climate models. I love the name of the post: Variable Variability. It shows the figure below. How variable the variability between models is shows how much effort modellers put into modelling internal variability. For that reason alone, I see no reason to simply equate the model ensemble spread with the uncertainty.



Natural variability

Next to the internal variability there is also natural variability due to volcanoes and solar variations. Natural variability has always been an important part of climate research. The CLIVAR (climate variability and predictability) program is a component of the World Climate Research Programme and its predecessor started in 1985. Even if in 2015 and 2016, the journal Nature will probably publish less "hiatus" papers, natural variability will certainly stay an important topic for climate journals.

The studies that sought to explain the "hiatus" are still useful to understand why the temperatures were lower some years than they otherwise would have been. At least the studies that hold; I am not fully convinced yet that the data is good enough to study such minute details. In the Karl et al. (2015) study we have seen that small updates and reasonable data processing differences can produce small changes in the short-term temperature trends that are, however, large relative to something as minute as this "hiatus" thingy.

One reason the study of natural variability will continue is that we need this for decadal climate prediction. This new field aims to predict how the climate will change in the coming years, which is important for impact studies and prioritizing adaptation measures. It is hoped that by starting climate models with the current state of the ocean, ice cover, vegetation, chemistry and hydrology, we will be able to make regional predictions of natural variability for the coming years. The confidence intervals will be large, but given the large costs of the impacts and adaptation measures, any skill has large economic benefits. In some regions such predictions work reasonably well. For Europe they seem to be very challenging.

This is not only challenging from a modelling perspective, but also puts much higher demands on the quality and regional detail of the climate data. Researchers in our German decadal climate prediction project, MiKlip, showed that the differences between the different model systems could only be assessed well using a well homogenized radiosonde dataset over Germany.

Hopefully, the research on decadal climate prediction will give scientists a better idea of the relationship between model spread and uncertainty. The figure below shows a prediction from the last IPCC report, the hatched red shape. While this is not visually obvious, this uncertainty is much larger than the model spread. The likelihood to stay in the shape is 66%, while the model spread shown covers 95% of the model runs. Had the red shape also shown the 95% level, it would have been about twice as high. How much larger the uncertainty is than the model spread is currently to a large part expert judgement. If we can formally compute this, we will have understood the climate system a little bit better again.






Related reading

In a blind test, economists reject the notion of a global warming pause

Are climate models running hot or observations running cold?

Reference

Ben Bouallègue, Zied, Theis, Susanne E., Gebhardt, Christoph, 2013: Enhancing COSMO-DE ensemble forecasts by inexpensive techniques. Meteorologische Zeitschrift, 22, p. 49 - 59, doi: 10.1127/0941-2948/2013/0374.

Rowlands, Daniel J., David J. Frame, Duncan Ackerley, Tolu Aina, Ben B. B. Booth, Carl Christensen, Matthew Collins, Nicholas Faull, Chris E. Forest, Benjamin S. Grandey, Edward Gryspeerdt, Eleanor J. Highwood, William J. Ingram, Sylvia Knight, Ana Lopez, Neil Massey, Frances McNamara, Nicolai Meinshausen, Claudio Piani, Suzanne M. Rosier, Benjamin M. Sanderson, Leonard A. Smith, Dáithí A. Stone, Milo Thurston, Kuniko Yamazaki, Y. Hiro Yamazaki & Myles R. Allen, 2012: Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nature Geoscience, 5, pp. 256–260, doi: 10.1038/ngeo1430.