Wednesday, 10 July 2013

Statistical problems: The multiple breakpoint problem in homogenization and remaining uncertainties

This is part two of a series on statistically interesting problems in the homogenization of climate data. The first part was about the inhomogeneous reference problem in relative homogenization. This part will be about two problems: the multiple breakpoint problem and about computing the remaining uncertainties in homogenized data.

I hope that this series can convince statisticians to become (more) active in homogenization of climate data, which provides many interesting problems.

The five main statistical problems are:
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant

Problem 2. The multiple breakpoint problem

For temperature time series about one break per 15 to 20 years is typical. Thus most interesting stations will contain more than one break. Unfortunately, most statistical detection methods have been developed for one break. To use them on series with multiple breaks, one ad-hoc solution is to first split the series at the largest break (for example the standard normalized homogeneity test, SNHT) and investigate the subseries. Such a greedy algorithm does not always find the optimal solution.

Another solution is to detect breaks on short windows. The window should be short enough to contain only one break, which reduces power of detection considerably.

Multiple breakpoint methods can find an optimal solution and are nowadays numerically feasible. Especially using the optimization methods “dynamic programming”. For a certain number of breaks these methods find the break combination that minimize the internal variance, that is variance of the homogeneous subperiods, (or you could also state that the break combination maximizes the variance of the breaks). To find the optimal number of breaks, a penalty is added that increases with the number of breaks. Examples of such methods are PRODIGE (Caussinus & Mestre, 2004) or ACMANT (based on PRODIGE; Domonkos, 2011). In a similar line of research Lu et al. (2010) solved the multiple breakpoint problem using a minimum description length (MDL) based information criterion as penalty function.


This figure shows a screen shot of PRODIGE to homogenize Salzburg with its neighbors (click to enlarge). The neighbors are sorted based on their cross-correlation with Salzburg. The top panel is the difference time series of Salzburg with Kremsmünster, which has a standard deviation of 0.14°C. The middle panel is the difference between Salzburg and München (0.18°C). The lower panel is the difference of Salzburg and Innsbruck (0.29°C). Not having any experience with PRODIGE, I would read this graph as suggesting that Salzburg probably has breaks in 1902, 1938 and 1995. This fits to the station history. In 1903 the station was moved to another school. In 1939 it was relocated to the airport and in 1996 it was moved on the terrain of the airport. The other breaks are not consistently seen in multiple pairs and may thus well be in another station.

Recently this penalty function was found to be suboptimal (Lindau & Venema, 2013a). It was found that the penalty should be function of the number of breaks, not fixed per break and that the relation with the length of the series should be reversed. A better penalty function is thus needed. See this post for more information on the multiple breakpoint problem and this article

Multiple breakpoint methods are much more accurate as single breakpoint methods combined with ad-hoc fixes. This expected higher accuracy founded theoretically (Hawkins, 1972). In addition, in a recent benchmarking study (a numerical validation study using realistic datasets) of the European project HOME, it was found that modern homogenization methods, which take the multiple breakpoint and the inhomogeneous reference problems into account, are about a factor two more accurate as traditional methods (Venema et al., 2012).

Problem 3. Computing uncertainties

Also after homogenization uncertainties remain in the data due to various problems.
  1. Not all breaks in the candidate station can be detected
  2. Uncertainty in the estimation of correction parameters due to insufficient data
  3. Uncertainties in the corrections due to remaining inhomogeneities in the references
  4. The date of the break may be imprecise (see Lindau & Venema, 2013b)
From validation and benchmarking studies we have a reasonable idea about the remaining uncertainties that one can expect in the homogenized data. At least with respect to the mean. For daily data individual developers have validated their methods, but systematic validation and comparison studies are still missing.

Furthermore, such studies only provide a general uncertainty level, whereas more detailed information for every single station and period would be valuable. The uncertainties will strongly depend on the inhomogeneity of the raw data and on the quality and cross-correlations of the reference stations. Both of which vary strongly per station, region and period.

Communicating such a complicated errors structure, which is mainly temporal, but also partially spatial, is a problem in itself. Maybe generating an ensemble of possible realizations, similar to Brohan et al. (2006) could provide a workable route. Furthermore, not only the uncertainty in the means should be considered, but, especially for daily data, uncertainties in the complete probability density function need to be estimated and communicated.

Related posts

All posts in this series:
Problem 1. The inhomogeneous reference problem
Neighboring stations are typically used as reference. Homogenization methods should take into account that this reference is also inhomogeneous
Problem 2. The multiple breakpoint problem
A longer climate series will typically contain more than one break. Methods designed to take this into account are more accurate as ad-hoc solutions based single breakpoint methods
Problem 3. Computing uncertainties
We do know about the remaining uncertainties of homogenized data in general, but need methods to estimate the uncertainties for a specific dataset or station
Problem 4. Correction as model selection problem
We need objective selection methods for the best correction model to be used
Problem 5. Deterministic or stochastic corrections?
Current correction methods are deterministic. A stochastic approach would be more elegant
Previously, I wrote a longer explanation of the multiple breakpoint problem.

In previous posts I have discussed future research in homogenization from a climatological perspective.

Future research in homogenisation of climate data – EMS 2012 in Poland

HUME: Homogenisation, Uncertainty Measures and Extreme weather

A database with daily climate data for more reliable studies of changes in extreme weather

References

Brohan, P., J. Kennedy, I. Harris, S.F.B. Tett and P.D. Jones. Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. Journal of Geophysical Research, 111, no. D12106, 2006.

Caussinus, H. and O. Mestre. Detection and correction of artificial shifts in climate series. Applied Statistics, 53, pp. 405–425, doi: 10.1111/j.1467-9876.2004.05155.x, 2004.

Domonkos, P. Adapted Caussinus-Mestre Algorithm for Networks of Temperature series (ACMANT). International Journal of Geosciences, 2, 293-309, doi: 10.4236/ijg.2011.23032, 2011.

Hawkins, D.M. On the choice of segments in piecewise approximation. Journal of the Institute of Mathematics and its Applications, 9, pp. 250–256, 1972.

Lindau, R. and V.K.C. Venema. On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records. Idojaras, Quarterly journal of the Hungarian Meteorological Service, 117, no. 1, pp. 1-34, 2013a.

Lindau, R. and V.K.C. Venema. Break position errors in climate records. 12th International Meeting on Statistical Climatology, IMSC2013, Jeju, South Korea, 24-28 June, 2013b

Lu, Q., R.B. Lund, and T.C.M. Lee. An MDL approach to the climate segmentation problem. Annals of Applied Statistics, 4, no. 1, pp. 299-319, doi: 10.1214/09-AOAS289, 2009. (Ungated manuscript)

Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M.J. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma. Benchmarking homogenization algorithms for monthly data. Climate of the Past, 8, pp. 89-115, doi: 10.5194/cp-8-89-2012, 2012.

21 comments:

Dominic Hills said...

I am genuinely searching for an answer to this problem that I think may exist and your blog is the closest Ive come to understanding, but I have not found an answer to the particular issue, sorry if I have missed it, please point it out and I will read up.

Man made Climate change may be a statistical artefact. It may be a mistake caused by combining good science in a bad way. here is my argument.
premise 1, pre 1970s temperature is measured by studying a variety of indirect sources, or direct but inaccurate sources. It is the correct scientific process to homogenise this data. That is to average it.
Premise 2, during the 70s the US set up and funded a global network of weather stations on buoys in the oceans. (this was a way of monitoring Soviet nuclear tests) From this point the indirect data was replaced by direct and highly acurate temperature measurements. this data will not be averaged.
conclusion. Switching from indirect and therefore averaged temperature measurement to direct temperature that is not averaged will necessarily result in a false understanding of the earths natural temperature oscillations. One of the signifiers of man made climate rather than a natural climate change is the extremity of changes. A rapid change in temperature is a signifier of made climate change. The current understanding of earths climate oscillation is that it will change naturally but its only the extreme change that can be seen as man made. the current understanding is based on reading temperature through the ages. But as premise 1 states this is homogenised data, and so necessarily less extreme than the direct data.
If this statstical transition is not addressed it must result in a statistical artefact.
There are 2 ways to address this.
1, add inaccuracy to the post 70s direct data. That is apply the same homogenising process to the accurate data as to the indirect data. Climatologists would have to ignore their accurate readings. Is this happening?
2, choose a single strand of data from the previous temperature charts, for example tree ring data and use that data homogenised. are scientists doing this?
I beleive the answer is no to both, (1) and (2), because both are examples of bad practise.
prediction from conclusion.
Tree ring data, coral growth and butterfly emergence data will diverge from the official temperature chart form the mid 70s onwards. This is what has happened.
All ideas presented here are my own, and derived from true openly sourced premises. Dominic Hills

Victor Venema said...

Before 1970 we also had direct instrumental observations. Before 1880 it starts to get hard to have enough data from all over the world for computing a decent global mean temperature.

You are right if you mean that after 1970 the homogenisation of station data (and of sea surface temperature data) does not make large changes to the global mean temperature (regionally it does matter a lot, also in this period).

The number of buoys has only become significant in the last decades. See this figure, from this page of the UK MetOffice. The buoys actually measure temperature that are a bit lower than those of the ship observations, if you only look at data from one type of measurement you also see the warming and if you would not take this difference into account you would get an artificially too low warming in your data. It thus goes into the other direction than you suspect and is taken into account.

(1) Yes, scientists work on removing artefacts from changes in the way temperature was observed. That is what I work on.

(2) [[The divergence problem]] is only for some tree ring observations in some regions. There are many other ways to indirectly estimate the temperature, which also show warming. The tree rings are the outlier.

You also do not need thermometers to know it is warming. Glaciers are melting, from the tropical [[Kilimanjaro]] glaciers, to the ones in the Alps and Greenland. Arctic sea ice is shrinking. The growing season in the mid-latitudes has become weeks longer. Trees bud and blossom earlier. Wine can be harvested earlier. Animals migrate earlier. The habitat of plants, animals and insects is shifting poleward and up the mountains. Lakes and rivers freeze later and break-up the ice earlier. The oceans are rising.

Thermometer data is wonderful for its spatial information, high accuracy and daily resolution, but even without looking at any thermometer data, even if we would not have invented the thermometer, it would have been very clear the Earth is warming over the last century.

Dominic Hills said...

Thankyou Victor for this, its a pleasure to have an answer from someone directly involved in the specific area I am interested in.

I agree the earth is warming, I think this an observable fact. I do not challenge that it is or that it is increasing in temperature at a great rate. I am suggesting that this extreme rise is a natural phenomena and extreme changes could have been observed pre 1970s if we had the technology we have today. As we increase the amount of reliable evidence, then scientists like you reduce the amount of homogenisation or averaging needed. So as we gain more reliable evidence with more weather buoys the extremity of the temperature estimated should rise..(if it is rising) or fall more extremely if it is falling. The averaging will cause a mollification of extremes. And it is the extremes that signifies AGW, not that the earth temperature changes, but how extremely it does this.

Homogenisation itself could be creating a statistical artefact, especially as sudden shift in technology such as happened in the 70s when cold war money was spent on this then obscure subject. how do you negate this?

Victor Venema said...

How we remove the effect of changes in measurement methods and changes in the local environment is explained in the links of my last comment. You will find the more useful ones also here.

Dominic Hills said...

Thanks Victor,
Sorry I didnt mean to ask you to repeat yourself, It must be frustrating talking to non scientists like myself. It was only that I had read the links you provided and not found exactly what I am looking for, although I did learn some interesting things about your field.
I think I have my answer from the content of your reply. I did not previously believe that scientists would modify accurate temperature readings. I thought that best science would be to use the highest quality data and reduce the amount of homogenisation in line with the level of accuracy.

If I have understood correctly, you do not do this, but instead add inaccuracy to the data and continue similar levels of homogenisation as to the older (less accurate)data.
So The newer directly recorder temperature readings goes through the same modification process as the more unreliable, multi stranded and sometimes indirect data.

I appreciate you explaining this to me.

Victor Venema said...

I would not formulate it the way you say it, but yes, we do not look at how accurate the absolute temperature is, but make sure that the data is comparable between the decades so that the warming can be interpreted.

The absolute temperature of the Earth is very hard to determine. It varies a lot spatially and we only know it within about 0.5°C. It is easier to look at the warming, this varies much more gradually in space.

Some climate sceptics make a big deal out of the sea surface temperatures of NOAA where the buoys were adjusted to the ship measurements. For the warming this does not matter. It is mostly an arbitrary choice. We have more ship data going back in time than buoys, thus changing the buoys is the smaller adjustment, which I expect was the reason NOAA made their choice, but even that does not really matter in times where we have computers to do the calculations.

Dominic Hills said...

so I began with the premise that in the 70s cold war money accelerated he accuracy of data collected for this obscure and underfunded area of science.

You say this is false because the ship bucket measurements and other instrumentation were already being used and so the difference in technology was not so great.

I suggested that a sudden influx of new accurate data would create a statistical artefact demonstrated in the hockey stick graph.

You say this is false because of your answer to premise 1, and that sea surface temperature is not the main factor defining the graph.

I hope that is correct.

Victor Venema said...

Sea surface temperature is naturally important. The oceans are 70% of the Earth's surface, but the adjustments do not change the global mean temperature much since the 1970s. The warming is already in the raw data and your initial idea that the warming is due to the adjustments does not fit to what is actually done.

If you look longer term, at the warming since 1880, the adjustments to the sea surface temperature are actually the most important ones. The raw data show much more warming and the adjustments reduce our warming estimate for the global mean temperature.

Dominic Hills said...

I think I may be failing to communicate my questions well enough, I'm sorry about that.

I understand that the earth is warming.
And whatever is happening to the temperature I am asking if the levels of homogenisation is inversely correlated with the levels of accuracy.
So
More accuracy in data collection, less homogenisation.

this is significant because homogenisation has the effect of reducing extremes. Any averaging process will do this, peaks will not be so high and troughs will be shallower.
So
the more homogenisation, the less extreme the result.

as AGW is partly determined by beyond natural norms, more extreme than nature alone would produce, then if the levels of homogenisation is adjusted in line with accuracy and reliability of data sources.
then
the levels of extremity will be adjusted.

So could the increase in technology and accurate data collection reduce levels of homogenisation needed and thus create a graph of greater extremes, that is then interpreted as man made influence?

I am very sorry if you feel you have answered this already, I am trying my best and reading all the links you provide. I can see some contradiction to my premise in that the sea temperature adjust down the mean global temp. But it is not precisely my point. also i have no other reference for this idea, i have not read it anywhere, i did not know some sceptics have addressed it.

Victor Venema said...

Maybe we have trouble understanding each other because you see homogenisation as averaging and I do not. If homogenisation were just averaging (spatial smoothing) it could not change the global mean temperature and it does.

Imagine that over a decade a weather service replaces its manual observations in a traditional Stevenson screen with automatic weather stations (AWS) and that these AWS observe temperatures that are 0.2°C cooler. Averaging would not remove this bias and the entire network would see 0.2°C too little warming if you would make a simple average over all stations. Or if you would smooth all stations and then make an average.

What we do it compare a candidate station that has such a change with its neighbours that do not and thus estimate the size of the change and adjust the homogenised data accordingly.

(This next part is probably not what you are thinking of and a detail, but just in case this was your question:
Estimating these adjustments one naturally makes errors, some changes are not detected, sometimes you think there is a change, while there is none, sometimes you get the date of the change wrong. As a consequence the time series of the estimated adjustments will on average have smaller adjustments (explained variance) than the time series of the (unknown) actually needed adjustments. Given that we adjust land to increase the warming estimates, we should probably adjust it more to show more warming.

I am not an expert for warming over the ocean, but I think the way they work they would not have this problem. Their adjustments naturally have an uncertainty, but I do not thing that they would under-adjust their data.)

When estimating warming the problem is not accuracy, but stability. A thermometer that is always 1°C off is no problem for studying warming. The problems start when people try to make the instrument more accurate or try to make the observation cheaper and thus change the observation.

Changes are also not always related to accuracy. When a station has to move, the observations at both locations have the same accuracy, but still can make for a change that needs to be removed. When a city growths around a station, the warming from urbanization is accurately observed, but is not the large-scale warming we are interested in when studying global warming and thus needs to be removed as well. Similar for vegetation growing tall around a station or increases in irrigation.

Dominic Hills said...

I suppose what I have done is looked at graphs of climate change and seen that the extent of possible variability from the estimated mean value closed dramatically at the mid 70s point. Each graph displays all the variables, it would be possible to follow all these red lines and treat one particular line as the true temperature. Doing that would describe a climate in a state of extreme flux. the oscillation would look very drastic. but instead these outliers are mollified as they are brought into the general scheme.
then in the 70s all this narrows, the distance between the extreme possibilities is reduced and run closer and closer to the actual mean temperature, and this narrowing trend continues right up to the current day. This sudden collapse of the extent of possibilities seems to me to be the result of a sudden replacement of new accurate technology.

Victor Venema said...

Are you thinking of a plot like Figure 8 in this SkS post?

Dominic Hills said...

I have read through this and cannot see anything that addresses my point. But I have adapted one of the graphs ( a land temperature one to demonstrate Im not restricting this point to the weather bouys)

I cannot think how to show the image other than make my profile it.

Im trying to show the level of homogenisation collapses in the 70s. variability collapses. this must be a result of cold war money flooding the science.

This is the core of my argument

Victor Venema said...

The figure does not have enough resolution when you make it your profile picture. Can you make a link to it? (Put it on your blog?)

Dominic Hills said...

centraldevonmomentum.wordpress.com/2017/09/16/variability-reduction-in-climate-change/

Victor Venema said...

Thanks. Now we are talking.

The phenomenon you are talking about does not have so much to do with homogenisation, but with baselining graphs. Like I wrote above what we know accurately is the warming, not the absolute temperatures. That is why the warming graphs normally show anomalies relative to some arbitrary baseline period. The average temperature (anomaly) over the baseline period is subtracted so that the depicted data over this period is exactly zero. The baseline period of the graph you showed is 1981 to 2010.

Before we had climate change climatologists computed the average climate over the last 30 years as a service to their users: farmers, engineers, etc. Realising that the climate always changes a little, this was repeated every 30 years. That is where the 30 years comes from.

After setting the anomaly to zero over the baseline period there are naturally less differences between the graphs over this period than before and after. That is an important part of why this period looks to have less "variability".

Another reason for the differences between the datasets are the adjustments. Everyone group has it own methods to compute them and before the 1950s there are clear adjustments and thus also the way they are made matters. The uncertainty in the adjustments shows up as differences between the datasets.

However, the main reason why the datasets agree less with each other before WWII is that we do not have observations everywhere. Thus what is shown are not real global averages for many datasets, but only part of the surface. Especially in the Arctic, the Antarctic and Africa there are large regions without measurements in earlier times.

The warming in the Arctic is stronger than average, in Africa is less than average. Thus how you compute the "global" average matters. The groups differ in their willingness to fill those gaps. Doing so improves the estimates of the warming globally, but you can also argue that regionally the uncertainties are too large and the data best not used there. When comparing with models, the model output can then also simply remove these regions to compare like with like. Furthermore, the larger the gaps are you are filling, the more advanced methods you will need to get good results.

If you compute the global average for all datasets in the same way, the differences between the difference datasets becomes quite small; see this graph of the land temperature computed by the IPCC. Over the ocean the differences would likely be larger, also when comparing like with like, here there are larger difference in homogenisation adjustments, but I do not know of such a graph on the net.

(The real uncertainty in how much warming we had is larger than the difference between the datasets. They all adjust for the known problems, but the estimates for how big these problems are and the unknown problems with be similar.)

Dominic Hills said...

I think I understand a bit better.

I think you are saying that the variability is increased the further it is away from the baseline. And that the baseline is arbitrary, in the 70s the temperature was at the baseline so had zero variation.

But if this is the correct understanding of the variability process you are describing , that would mean the variability would increase as we move further away from the baseline. The new temperature records would demonstrate greater variability because they exceed the baseline by such a large amount.. and this does not happen. The level of variability added to estimated current temperature is very small. There is little adjustment.

My previous understanding of the process was that variability was added as the data source was either of lesser quality, more indirect or there was less of it. And that then as the variability increased then so did the amount of homogenisation.

in all though you seem to be agreeing that for various reasons homogenisation levels has dramatically fallen, and this brings me back to my primary idea. Although its not spatial smoothing, as you explain, homogenisation will itself create a less extreme temperature graph.. it will necessarily reduce extremes because when adjustments are added together then averaged this will have to be the result. (you mention homogenisation actually resulting in a cooler temperature for the sea surface temp.. so less extreme in the other direction).
And as this homogenisation is reduced, for reasons of synchronising data sourcing or for obtaining greater and more accurate global records, then the estimated global temperature will present as sharper changeability.

At previous periods of recent history (1800s, or 1950s) the temperature could have been as high as today, if we could measure it with todays method. Tree ring data or some other biological source may support this, but this data must be mollified as it is indirect or circumstantial. This biological data continues to undergo the same process. whereas the temperature that previously was also mollified, now has far less variation added to it, so this could be why the biological data diverges from the 70s to present day.

Again please dont be too upset if I have misunderstood you, I can see you are being very patient, but I may misunderstand. i am taking my time to read carefully what you say and make sure ive understood as well as I can before replying.

Victor Venema said...

"you mention homogenisation actually resulting in a cooler temperature for the sea surface temp.. so less extreme in the other direction"

To be clear: The sea surface temperature warmed less than the raw observations would suggest. But it is warming, not cooling and I would say that you thus see less "extreme" warming.

Although in climatology we use the word "extreme" differently than you seem to do and typically make a distinction between long-term changes in the mean, which is what you see in the sea surface temperature and changes in the extremes (for example, more or bigger heat waves and less cold waves).


In case of temperature reconstructions based on tree rings and other biological and non-biological indirect observations you indeed get a reduction in the variability and thus the extremes: Except for tree rings, the exact years of the temperature estimate is often not known, which smears out the temperature peaks and makes them smaller. That is why these datasets normally do not have a yearly resolution, but represent averages over a longer period.

The stick of the hockey stick curve computed from such indirect observations has a large uncertainty and there can be peaks and decadal variability within this uncertainty range. A warming like we have seen since 1880 would, however, not be something that could be hidden in the uncertainty band. It is too large. Locally that may be possible, but not for the global average temperature, which is a smoother curve.

Especially, when considering that the current warming is not just a peak, will not go down to zero next year, but will at least stay with us (actually become larger) for centuries and millennia. Had a similar change occurred in the past that would be clearly visible in such datasets.

But for those types of data you'd better find another sparing partner. I just do station data. Science is highly specialised.

Dominic Hills said...

Ok thankyou for your time and efforts to explain things to me, i do genuinely appreciate it.
I am not a scientist (obviously) but a professional artist. Artists brains are wired all over, we are prone to synesthesia and find metaphor and analogy easy. I have worked for 4 years with academics at Exeter University and find as you say that scientists are very specialised.
I think this is difference is important. I trust completely in the integrity of your process, and can see nothing but good scientists apply good science to their speciality.
The artist john latham pioneered a technique of imbedding artists in scientific projects, and had some successful results.
but i think this may be a frustrating process for you, so I just want to thank you and assure you that some of what you have explained has got through, you are not wasting your time completely talking to the likes of me. .

Victor Venema said...

Always happy to chat about the topic I love most with people who are interested. (The problem due to the US culture war is finding out whether people are really interested.) The intrinsic motivation is something we have in common with artists.

Also science is quite a creative profession. I guess it is the part hidden from the public. Quite often also hidden from the scientists themselves. I once gave a talk in our seminar on creative thinking. Afterwards a PhD student complained how I could waste people's time that way. I guess I had not done a good job explaining how important it is. I only noticed it myself when I invented a new way to generate clouds fields in a computer. I thought it was completely straight forward, but others thought it was strange, which was how I noticed I had made a creative jump.

Reality is a constraint, but I would argue that the constraints make the creative process interesting. Completely unconstrained thinking is no challenge. For science it is important to be able to switch modes from free floating thought to analytical thinking whether those thoughts make sense.

Science can only try to see what is. You can make people care.

Dominic Hills said...

I agree, the best science is very creative, coming up with ideas to test hypothesis needs a huge amount of creativity. Often, unfortunately art does not display this level of creativity. There are many cross overs between the two worlds. I think relativity is one of the most creative ideas I have ever learnt about, and Piccaso and Einstein were influenced by the same philosopher.. Cubism is Picasso's version of the idea.
Obviously Feynman liked hanging out with us lot also, I think its a healthy exchange.

I understand what you mean about the cultural war on this subject, it muddies things and makes it difficult to know if there is genuine interest or if i am just trying to make some political point. Thanks for all your answers, I am not one of them, I just interrogate the world until I understand. But somethings are beyond me.