February 2020

Monday, 24 February 2020

Estimating the statistical properties of inhomogeneities without homogenization

One way to study inhomogeneities is to homogenize a dataset and study the corrections made. However, that way you only study the inhomogeneities that have been detected. Furthermore, it is always nice to have independent lines of evidence in an observational science. So in this recently published study Ralf Lindau and I (2019) set out to study the statistical properties of inhomogeneities directly from the raw data.

Break frequency and break size

The description of inhomogeneities can be quite complicated.

Observational data contains both break inhomogeneities (jumps due to, for example, a change of instrument or location) and gradual inhomogeneities (for example, due to degradation of the sensor or the instrument screen, growing vegetation or urbanization). The first simplification we make is that we only consider break inhomogeneities. Gradual inhomogeneities are typically homogenized with multiple breaks and they are often quite hard to distinguish from actual multiple breaks in case of noisy data.

When it comes to the year and month of the break we assume every date has the same probability of containing a break. It could be that when there is a break, it is more likely that there is another break, or less likely that there is another break.* It could be that some periods have a higher probability of having a break or the beginning of a series could have a different probability or when there is a break in station X, there could be a larger chance of a break in station Y. However, while some of these possibilities make intuitively sense, we do not know about studies on them, so we assume the simplest case of independent breaks. The frequency of these breaks is a parameter our method will estimate.

* When you study the statistical properties of breaks detected by homogenization methods, you can see that around a break it is less likely for there to be another break. One reason for this is that some homogenization methods explicitly exclude the possibility of two nearby breaks. The methods that do allow for nearby breaks will still often prefer the simpler solution of one big break over two smaller ones.

When it comes to the sizes of the breaks we are reasonably confident that they follow a normal distribution. Our colleagues Menne and Williams (2005) computed the break sizes for all dates where the station history suggested something happened to the measurement that could affect its homogeneity.** They found the break size distribution plotted below. The graph compares the histogram to a normal distribution with an average of zero. Apart from the actual distribution not having a mean of zero (leading to trend biases) it seems to be a decent match and our method will assume that break sizes have a normal distribution.

Figure 1. Histogram of break sizes for breaks known from station histories (metadata).

** When you study the statistical properties of breaks detected by homogenization methods the distribution looks different; the graph plotted below is a typical example. You will not see many small breaks; the middle of the normal distribution is missing. This is because these small breaks are not statistically significant in a noisy time series. Furthermore, you often see some really large breaks. These are likely multiple breaks being detected as one big one. Using breaks known from the metadata, as Menne and Williams (2005) did, avoids or reduces these problems and is thus a better estimate of the distribution of actual breaks in climate data. Although, you can always worry that the breaks not known in the metadata are different. Science never ends.

Figure 2. Histogram of detected break sizes for the lower USA.

Temporal behavior

The break frequency and size is still not a complete description of the break signal, there is also the temporal dependence of the inhomogeneities. In the HOME benchmark I had assumed that every period between two breaks had a shift up or down determined by a random number, what we call “Random Deviation from a baseline” in the new article. To be honest, “assumed” means I had not really thought about it when generating the data. In the same year, NOAA published a benchmark study where they assumed that the jumps up and down (and not the levels) were given by a random number, that is, they assumed the break signal is a random walk. So we have to distinguish between levels and jumps.

This makes quite a difference for the trend errors. In case of Random Deviations, if the first jump goes up it is more likely that the next jump goes down, especially if the first jump goes up a lot. In case of a random walk or Brownian Motion, when the first jump goes up, this does not influence the next jump and it has a 50% probability of also going up. Brownian Motion hence has a tendency to run away, when you insert more breaks, the variance of the break signal keeps going up on average, while Random Deviations are bounded.

The figure from another new paper (Lindau and Venema, 2020) shown below quantifies the big difference this makes for the trend error of a typical 100 years long time series. On the x-axis you see the frequency of the breaks (in breaks per century) and on the y-axis the variance of the trends (in Kelvin² or Celsius² per century²) these breaks produce.

The plus-symbols are for the case of Random Deviations from a baseline. If you have exactly two breaks per time series this gives the largest trend error. However, because the number of breaks varies, an average break frequency of about three breaks per series gives the largest trend error. This makes sense as no breaks would give no trend error, while in case of more and more breaks you average over more and more independent numbers and the trend error becomes smaller and smaller.

The circle-symbols are for Brownian Motion. Here the variance of the trends increases linearly with the number of breaks. For a typical number of breaks of more than five, Brownian Motion produces a much larger trend error than Random Deviations.

Figure 3. Figure from Lindau and Venema (2020) quantifying the trend errors due to break inhomogeneities. The variance of the jump sizes is the same in both cases: 1 °C².

One of our colleagues, Peter Domonkos, also sometimes uses Brownian Motion, but puts a limit on how far it can run away. Furthermore, he is known for the concept of platform-like inhomogeneity pairs, where if the first break goes up, the next one is more likely to go down (or the other way around) thus building a platform.

All of these statistical models can make physical sense. When a measurement error causes the observation to go up (or down), once this problem is discovered it will go down (or up) again, thus creating a platform inhomogeneity pair. When the first break goes up (or down) because of a relocation, this perturbation remains when the the sensor is changed and both remain when the screen is changed, thus creating a random walk. Relocations are a frequent reason for inhomogeneities. When the station Bonn is relocated, the operator will want to keep it in the region, thus searching in a random directions around Bonn, rather than around the previous location. That would create Random Deviations.

In the benchmarking study HOME we looked at the sign of consecutive detected breaks (Venema et al., 2012). In case of Random Deviations, like HOME used for its simulated breaks, you would expect to get platform break pairs (first break up and the second down, or reversed) in 4 of 6 cases (67%). We detected them in 63% of the cases, a bit less, probably showing that platform pairs are a bit harder to detect than two breaks going in the same direction. In case of Brownian Motion you would expect 50% platform break pairs. For the real data in the HOME benchmark the percentage of platforms was 59%. So this does not fit to Brownian Motion, but is lower than you would expect from Random Deviations. Reality seems to be somewhere in the middle.

So for our new study estimating the statistical properties of inhomogeneities we opted for a statistical model where the breaks are described by a Random Deviations (RD) signal added to a Brownian Motion (BM) signal and estimate their parameters to see how large these two components are.

The observations

To estimate the properties of the inhomogeneities we have monthly temperature data from a large number of stations. This data has a regional climate signal, observational and weather noise and inhomogeneities. To separate the noise and the inhomogeneities we can use the fact that they are very different with respect to their temporal correlations. The noise will be mostly independent in time or weakly correlated in as far as measurement errors depend on the weather. The inhomogeneities, on the other hand, have correlations over many years.

However, the regional climate signal also has correlations over many years and is comparable in size to the break signal. So we have opted to work with a difference time series, that is, subtracting the time series of a neighboring station from that of a candidate station. This mostly removes the complicated climate signal and what remains is two times the inhomogeneities and two times the noise. The map below shows the 1459 station pairs we used for the USA.

Figure 4. Map of the lower USA with all the pairs of stations we used in this study.

For estimating the inhomogeneities, the climate signal is noise. By removing it we reduce the noise level and avoid having to make assumptions about the regional climate signal. There are also disadvantages to working with the difference series, inhomogeneities that are in both the candidate and the reference series will be (partially) removed. For example, when there is a jump because of the way the temperature is computed this leads to a change in the entire network***. Such a jump would be mostly invisible in a difference series. Although not fully invisible because the jump size will be different in every station.

*** In the past the temperature was read multiple times a day or a minimum and maximum temperature thermometer was used. With labor-saving automatic weather stations we can now sample the temperature many times a day and changing from one definition to another will give a jump.

Spatiotemporal differences

As test statistic we have chosen the variance of the spatiotemporal differences. The “spatio” part of the differences I already explained, we use the difference between two stations. Temporal differences mean we subtract two numbers separated by a time lag. For all pairs of stations and all possible pairs of values with a certain lag, we compute the variance of all these difference values and do this for lags of zero to 80 years.

In the paper we do all the math to show how the three components (noise, Random Deviation and Brownian Motion) depend on the lag. The noise does not depend on the lag. It is constant. Brownian Motion produces a linear increase of the variance as a function of lag, while the Random Deviations produce a saturating exponential function. How fast the function saturates is a function of the number of breaks per century.

The variance of the spatiotemporal differences for America is shown below. The O-symbols are the variances computed from the data. The other lines are the fits for the various parts of the statistical model. The variance of the noise is about 0.62 Kelvin² or Celsius² and shown as a horizontal line as it does not depend on the lag. The component of the Brownian Motion is the line indicated by BM, while the Random Deviation (RD) component is the curve starting at the origin and growing to about 0.47 K². From how fast this curve growths we estimate that the American data has one RD break every 5.8 years.

The curve for Brownian Motion being a line already suggests that it is not possible to estimate how many BM breaks the time series contains, we only know the total variance, but not whether it is contained in many small ones or one big one.

Figure 5. The variance of the spatiotemporal differences as a function of the time lag for the lower USA.

The situation for Germany is a bit different; see figure below. Here we do not see the continual linear increase in the variance we had above for America. Apparently the break signal in Germany does not have a significant Brownian Motion component and only contains Random Deviation breaks. The number of breaks is also much smaller, the German data only has one break every 24 years. The German weather service seems to give undisturbed climate observations a high priority.

For both countries the size of the RD breaks is about the same and quite small, expressed as typical jump size it would be about 0.5°C.

Figure 6. The variance of the spatiotemporal differences as a function of the time lag L for Germany.

The number of detected breaks

The number of breaks we found for America is a lot larger than the number of breaks detected by statistical homogenization. Typical numbers for detected breaks are one per 15 years for America and one per 20 years for Europe, although it also depends considerably on the homogenization method applied.

I was surprised by the large difference between actual breaks and detected breaks, I thought we would maybe miss 20 to 25% of the breaks. If you look at the histograms of the detected breaks, such as Figure 2 reprinted below, where the middle is missing, it looks as if about 20% is missing in a country with a dense observational network.

But these histograms are not a good way to determine what is missing. Next to the influence of chance, small breaks may be detected because they have a good reference station and other breaks are far away, while relatively big breaks may go undetected because of other nearby breaks. So there is not a clear cut-off and you would have to go far from the middle to find reliably detected breaks, which is where you get into the region where there are too many large breaks because detection algorithms combined two or more breaks into one. In other words, it is hard to estimate how many breaks are missing by fitting a normal distribution to the histogram of the detected breaks.

If you do the math, as we do in Section 6 of the article, it is perfectly possible not to detect half of the breaks even for a dense observational network.

Figure 2. Histogram of detected break sizes for the lower USA.

Final thoughts

This is a new methodology, let’s see how it holds when others look at it, with new methods, other assumptions about the nature of inhomogeneities and other datasets. Separating Random Deviations and Brownian Motion requires long series. We do not have that many long series and you can already see in the figures above that the variance of the spatiotemporal differences for Germany is quite noisy. The method thus requires too much data to apply it to networks all over the world.

In Lindau and Venema (2018) we introduced a method to estimate the break variance and the number of breaks for a single pair of stations (but not BM vs RD). This needed some human inspection to ensure the fits were right, but it does suggest that there may be a middle ground, a new method which can estimate these parameters for smaller amounts of data, which can be applied world wide.

The next blog post will be about the trend errors due to these inhomogeneities. If you have any questions about our work, do leave a comment below.

References

Lindau, R, Venema, V., 2020: Random trend errors in climate station data due to inhomogeneities. International Journal Climatology, 40, pp. 2393-2402. Open Access. https://doi.org/10.1002/joc.6340

Lindau, R, Venema, V., 2019: A new method to study inhomogeneities in climate records: Brownian motion or random deviations? International Journal Climatology, 39: p. 4769– 4783. Manuscript. https://eartharxiv.org/vjnbd/ https://doi.org/10.1002/joc.6105

Lindau, R. and Venema, V.K.C., 2018: The joint influence of break and noise variance on the break detection capability in time series homogenization. Advances in Statistical Climatology, Meteorology and Oceanography, 4, p. 1–18. https://doi.org/10.5194/ascmo-4-1-2018

Menne, M.J. and C.N. Williams, 2005: Detection of Undocumented Changepoints Using Multiple Test Statistics and Composite Reference Series. Journal of Climate, 18, 4271–4286. https://doi.org/10.1175/JCLI3524.1

Menne, M.J., C.N. Williams, and R.S. Vose, 2009: The U.S. Historical Climatology Network Monthly Temperature Data, Version 2. Bulletin American Meteorological Society, 90, 993–1008. https://doi.org/10.1175/2008BAMS2613.1

Venema, V., O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C.N. Williams, M.J. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban, Th. Brandsma, 2012: Benchmarking homogenization algorithms for monthly data. Climate of the Past, 8, pp. 89-115. https://doi.org/10.5194/cp-8-89-2012

Tuesday, 11 February 2020

Bernie Sanders is more electable than Joe Biden and will win

Bernie Sanders will become the 46th US president.

After Iowa and so many good New Hampshire polls for Sanders, it is about time to present my prediction for the 2020 presidential election before it is no longer an interesting take. I try to only write about such matters when I think the mainstream opinion is wrong and the published opinion is wrong about Sanders' electability.

Full disclose: I hope Sanders or Warren wins, the biggest problem America faces is crony capitalism. Systemic corruption is the foundation of nearly all US problems, which spill into the world, including insufficient climate action. Given this bias I will try to quantify as much as possible and give my sources.

Unfortunately it is not guaranteed Sanders will win and it is hard to quantify, but to go on the record with a clear prediction, let me state he has a chance of 54% of winning. This is based on a chance to win the primary of 60% and then a chance of 90% to win the general. This makes it a probabilistic prediction, just like "there is a 70% probability it will rain tomorrow", which needs multiple predictions to validate. For validation, you could combine it with my previous political predictions going against the mainstream:

1. I already have my warning for clear and present danger before the 2016 election: "there is now a real possibility Trump could become president". In the post you will find the reasons why the terrible pundits in the US media were wrong.

2. Another prediction was that the UK election in 2017 would be a lot closer than poll whisperer Nate Silver predicted because he ignored comrade trend. (Although he is an incompetent establishment pundit, but really good with numbers, so this was an interesting prediction.)

I am confident that Sanders will win an election against Trump (90%), but even if it looks good now winning the primary (60%) is harder because TV news keeps on repeating that Sanders cannot win the general election, as far as I have seen mostly without arguments and sometimes with very cherry picked or hacky evidence.

The power is shifting from corporate media to social media, independent media and membership supported media. The media and candidates can no longer be sure to get away with misinformation without risking their reputation. Although sometimes they slip into old patterns and claim that they said X in 1976 and I am shouting at my monitor that everyone has seen the video of you saying Y.

The power is also shifting from big donors to crowd funding. Even in the face of rising inequality, technology has made small donations so easy as to be competitive. Fortunately to spread the truth you also need less money than to spread lies and presidential candidates get a lot of free media.

As moving target it is hard to say how much difference this power shift makes in 2020, we can be sure the donor class and the media will throw the kitchen sink at Sanders. They hurt themselves doing so, but they despise him from their corporate core to their high-dollar hosts and guests. So I am not as confident about my primary prediction, not knowing how this will play out.

Sanders Beats Trump

The media is sure Sanders cannot win because Republicans would call him a socialist. One often has the impression that they and the Democratic leadership think you are not allowed to reply when Republicans say something. At every primary debate Sanders thus gets his socialism question, gives a strong answer, which the journalists apparently have forgotten again in the next debate. Maybe they are trying to train us into also thinking that ~~resistance~~ replying is futile.

Democratic leadership would like Sanders to cower, just like them, to be weak, to defend themselves against the unfair accusation of being a socialist with some soft spoken words. But if you are defending you are losing. It is much stronger to accept the label and fill it with content.

Is there a better campaign than replying and telling the American people about the high quality of living in social democratic countries, about the higher salaries for workers, about their vibrant market economies, about their high ranking in global indices for entrepreneurship and freedom, about their well-trained competitive work forces, about being treated with respect, about a government that works for all and not just for the donors? Even Danish politicians have started helping:

So what is the quantitative evidence whether Sanders or Biden is the stronger candidate?

Policies

1. The policies of Sanders are the most popular ones. This is already clear by most presidential candidates adopting or claiming to adopt the most popular Sanders policies. To be fair the difference with Biden, on average over all policies, is just one percent, but does not go in the direction the pundits would like you to think:

"Senator Bernie Sanders of Vermont edges out his Democratic opponents on health care, immigration, the environment and the economy, according to a Reuters/Ipsos poll. ... For health care, arguably Sanders' staple issue, the Senator claims 27.1 percent support, eclipsing Biden and Warren by 9 percent and 14.6 percent, respectively. On the environment, Sanders again edges out Biden by 9.7 percent and Warren by 8.2 percent. He also comes out ahead on the economy and jobs."

This week's Quinnipiac University poll asked Democrat and Democrat-leaning voters: "Regardless of how you intend to vote in the Democratic primary for president, which candidate do you think - has the best policy ideas?" Sanders was the choice of 27%, Warren of 16% and Biden of 14%. The voting intentions from the same poll, are 25%, 14% and 17%, respectively, which are higher for Biden than the policy support and lower for Sanders. This suggests that many people unfortunately plan on voting for a candidate they agree with less because they believe the media on electability.

Money and enthusiasm

2. Biden is losing the Money Primary. In the fourth quarter of 2019 he was 3rd with respect to donations. (In the 3rd quarter he was only 4th.)

In the fourth quarter Sanders had 1.8 million individual donors, while Biden had only half a million donors. This is a sign of enthusiasm. Just as the 10 million calls to voters made by Sanders volunteers.

The number of donors. Sanders is leading in 46 states. Graphic: The New York Times.

Electability according to the markets

3. The betting market PredictIt finds it most likely that Sanders will win the primary. The graph below shows the price of shares for Sanders winning, which are equal to the predicted probability he will win in percent. Sanders has a probability of 45% of winning the primary and another betting market gives him a 29% probability of winning the presidency.

The betting market PredictIt for the Democratic primary over last 90 days. The price of stocks in cent is the percentage change a candidate will win the primary.

Following Bayesian statistics, the probability of winning the presidency, P(presidency), is the probability of winning the primary, P(primary), times the probability of winning the presidency after having won the primary, P(presidency|primary). As an equation this reads:

P(presidency) = P(primary) x P(presidency|primary)

From this is follows that:

P(presidency|primary) = P(presidency) / P(primary)

The numbers for Sanders are:

P(presidency) / P(primary) = 29% / 45% = 64%

The probability of winning presidency if the nominee is thus 64% for Sanders. The same numbers for Biden are:

P(presidency|primary) = P(presidency) / P(primary) = 5% / 12% = 42%

So people willing to put money on their political assessment do not agree with the pundit class and see Sanders as 50% more electable than Biden.

To be fair, like the pundits, I also disagree with the betting market. They have a 54% chance of Trump winning. That is preposterous for a historically unpopular president, but betting against Big Money is a loosing strategy on the short term. One would have to hold the bet until election day to win and the chance of Trump winning is unfortunately not zero.

National head to head polling 2020

4. There is the simple polling of head to head races. According to a recent Survey USA poll, this is evidence favoring for Sanders.

The poll found that 52 percent of voters would choose Sanders and 43 percent Trump, giving the veteran senator a nine-point lead. Next was former vice president Joe Biden at 50 percent to Trump's 43 percent, a seven-point lead.

Looking back at older similar polls, the situation can also be reversed. On average I see no difference between the two candidates.

I personally do not like these head to head polls at this stage. Some candidates do quite poorly in head to head polls against Trump. If you look in detail, you will find that Trump gets about the same percentage against all candidates. What varies is how many people prefer the Democratic candidate or are undecided. My impression is that this is mostly measuring name recognition.

National head to head polling 2016

5. It is hard to imagine being in the situation of having the chose between X and Trump, the election is almost a year out and part of the supporters of candidate Y will say they do not know or would vote Trump during the primary, but in the end vote for their party.

However, for 2016 we have similar polling closer to the date of the election. Biden is naturally not Clinton, but in May 2016 PolitiFact found that Sanders beat Trump more easily, by 3 to 12 percent points more than Clinton.

Just a few days before the election a Gravis poll showed that Clinton would beat Trump by 2%, while Sanders would beat Trump by 10%. Caveat: the questions seem neutral, but the poll was commissioned by a politician who endorsed Sanders.

While both Trump and Clinton had net negative favorability values, Sanders net favorability grew during the campaign as people got more familiar with his ideas and ended on plus 17% favorability.

Michael Bloomberg acknowledged these facts right after the 2016 election: “Bernie Sanders would have beaten Donald Trump. Polls show he would have walked away with it. But Hillary Clinton got the nomination.”

Head to head polling swing states

6. Swing states show another picture than the national polls. What the swing states are will depend on the candidate, but to avoid cherry picking, let's take the ones from the Cook Report. Their toss ups for the Electoral College are: Arizona, Florida, North Carolina, Pennsylvania and Wisconsin.

Unfortunately all the head to head state polling we have are the averages of Real Clear Politics, which does not take the quality of the polls into account like 538 usually does. This makes manipulating the public opinion with bad polls easier.

	Biden vs Trump			Sanders vs Trump
State	Trump	Biden	Net	Trump	Sanders	Net
Arizona	47.0	47.3	+0.3	48.5	43.5	-5.0
Florida	45.3	48.0	+2.7	47.0	47.0	Tie
North Carolina	44.8	48.2	+3.4	46.0	47.0	+1.0
Pennsylvania	43.3	50.3	+7.0	44.3	48.0	+3.7
Wisconsin	43.3	47.0	+3.7	44.7	46.7	+2.0
Average	44.7	48.2	+3.5	46.1	46.4	+0.3

Here Biden has a small advantage. Also Sanders would win most swing states, but with less of a margin according to these polls.

Personality

7. Sanders is personally very popular with Democrats and Americans. For example asking "which candidate do you think - cares the most about people like you?" 24% reply Sanders and 19% Biden, in a Quinnipiac University poll.

Asking which candidate is more honest in the same poll, 25% reply Sanders and 14% Biden. Thus Americans do not agree with political insiders and TV pundits who clearly dislike Sanders. Their dislike has a good reason, he would upend their corrupt self-dealing system.

Summary of the evidence

These are the more or less objective pieces of data we have, the rest is more political judgement. So let's summarize the evidence.

When it comes to policies Sanders is more popular. The money primary shows the money and enthusiasm is with Sanders. Looking at what betting markets expect to happen Sanders is more electable. And Americans see Sanders as some who cares about them and is honest. Biden also has good numbers, but not as good.

The mixed evidence comes from head to head polling. In swing states the Real Clear Politics polls give Biden an advantage, nationally the polling suggests that Sanders would beat Trump in 2020 and would have obliterated Trump in 2016.

Hope and change

On to the more subjective political assessment.

All the polling indicates that Americans are not happy and want change. Obama successfully campaigned on hope and change. That was not how he governed, but it was how he won elections as a skilful campaigner.

Biden runs on nothing will fundamentally change like Clinton ran on "America is already great". In 2016 Clinton won with the people who thought their candidates "Cares about people like me", "Has the right experience" or "Has good judgment", but Trump won the "Can bring needed change" with 83%, according to exit polling.

NYT and Trump endorsements

Intriguingly Biden did not even get the NYT endorsement, in fact he was not even in their top four, although they are his people. The NYT endorsement went to Warren and Klobuchar.

In public Trump may ignore Sanders, so much that I have the impression he deeply fears Sanders. But in private, in a secret recoding by his Ukrainian friend in crime, Lev Parnas, Trump admits that he fears Sanders the most. He may be an incompetent lazy fool, but he does know marketing.

Socialism

In the introduction I already argued that the Republicans making the same-old empty attacks by calling Sanders a socialist is welcome. There is also polling on this question. Data For Progress polled people whether they preferred Trump or Sanders with three different formulations:

No information: “If the 2020 U.S. Presidential election was held today, who would you vote for if the candidates were Bernie Sanders and Donald Trump?”

Partisan cues: “If the 2020 U.S. Presidential election was held today, who would you vote for if the candidates were Democrat Bernie Sanders and Republican Donald Trump?”

Socialists and billionaires: “If the 2020 U.S. Presidential election was held today, who would you vote for if the candidates were Democrat Bernie Sanders, who wants to tax the billionaire class to help the working class and Republican Donald Trump, who says Sanders is a socialist who supports a government takeover of healthcare and open borders?”

Calling Sanders a socialist did not hurt him. The only thing that ironically hurts a little is being called a Democrat.

Political record and campaign

A debate between Biden and Trump would look like the fight between Konstantin Chernenko and Ronald Reagan in Two Tribes Go To War. Biden runs on his record. He is thus vulnerable to what Trump does best and enjoys the most in life: denigrating other people in the media.

Frankie Goes To Hollywood - Two Tribes

Sanders runs on a policy platform and is thus less vulnerable to personal attacks. A platform with many policies Trump ran on in 2016, but did not execute because he campaigned as a populist, but governs as an establishment Republican plus hatred.

In times where people identify as Republican because they hate Democrats and identify as Democrat because they hate Republicans it is difficult to win elections by advocating for the policies of the other side. There are naturally policies that appeal to large majorities, that may thus also convince people from the other side.

That such policies are not implemented yet is because of the corrupting influence of money in politics and media. A politician who is free from such influences can make a highly attractive policy platform. A politician who floated up due to their support for the donor class and corporations is restricted. Corporations are not charities, they expect a return on investment. The donor class has different interests and world views than the rest of us. A policy package designed for them will be less attractive for voters.

The upside is the money, which clearly helps the campaign, as we can see in billionaire Bloomberg buying a preposterous vote share. In the past voters may have naively expected that the money did not have much influence and it also took time for the political class to become corrupted by it. But the distance between Washington DC and America has grown together with the length of the list of popular policies that have no chance of passing Congress.

Even if Biden would promise the same policies in the primary as Sanders, people by now expect a general election pivot and a cabinet full of people from the short lists of the donors. Consequently, there is now a much larger bonus for a reputation of honesty and consistency. Thus a people-power campaign needs less money in 2020.

Imagine Trump would legalize marijuana and remove American troops from Iraq. That would sink a Biden general election campaign. Biden not only voted for the Iraq war, already 5 years before the Iraq war, in 1998 Biden was making the case for a ground war.

There is a lot in Biden's record that can be used by the Trump campaign to suppress the Democratic turnout using targetted social media ads. Workers will get ads about Biden's position on the Permanent Normal Trade Relations with China and NAFTA. Poor and old people about Biden trying to reduce Social Security, Medicare and Medicaid.

Joe Biden lied about participating in the Civil Rights movement, admitted as much in 1987, but in this campaign he again started lying about it. Trump will not care about the hypocrisy of him pointing to such problems given his own abysmal record. His authoritarian followers will not see the targetted ads and would also not really care.

Sanders can hammer Trump on the promises Trump made and broke. Trump's budgets reduced Social Security, Medicaid and Medicare, which he had promised to protect. Trump's trade deals are almost the same and were negotiated with corporations at the table. Trump promised that his healthcare plan would cover everyone and would be cheaper, while millions lost their health insurance.

Project fear of the Democratic establishment likes to name drop candidates like George McGovern, but somehow do not mention Hillary Clinton, John Kerry or Al Gore. They especially do not mention Franklin D. Roosevelt, who won the presidency four times and whose New Deal has much in common with Sanders' platform. Also on the Republican side it seems to be hard to make the case that Republicans won who agreed with Democrats, while those who fought Democrats lost. Quite the opposite.

Winning the primary election

Polling aggregator 538 converts the polling information into a probability of winning the nomination by winning the majority of the delegates. The methodology seems to be sound and is likely the best estimate we have.

Nate Silver of 538 seemed a bit dismayed at how much the prediction changed after Iowa. The model gives a bonus for winning Iowa, which traditionally helps candidates in future races. Silver wondered whether the bonus was too large given that Sanders and Buttigieg are tied for one winning metric (the delegate equivalents). The bonus is to take the positive media coverage into account, but the media put much emphasis on the tie and less on Sanders winning the popular vote (in the first and final round), while Silver's model gave all three metrics equal weight.

My impression is that the jump was mostly so large because Biden lost so enormously and is on track to also losing the next two primaries. At the same time the competitors of Biden do not have much chance of winning. Buttigieg may do well today in New Hampshire, but hardly has any staff in subsequent states and nearly no support among non-white voters. Amy Klobuchar is rising, but still polling badly nationally.

In the betting market billionaire Bloomberg is the runner up after Sanders. He has spend $200 million on ads and bought himself 12% in national polling. This is still rising and it thus makes sense that a market would give him a bonus over polling. But as soon as he becomes a serious candidate people will bring up his atrocious record, today #BloombergIsARacist is trending as an appetiser. The media will be nice to him, Bloomberg is expected to spend a billion in ads and every media outlet wants to get some of that. But I expect that social media will keep him small.

Thus I do not see Buttigieg, Klobuchar or Bloomberg winning, but they all have a chance of succeeding Biden and will likely stay in the race a long time, splitting and wasting establishment votes.

Biden's campaign runs on money, which he only gets when he will likely win; the donors want a return on investment. So he may be forced out of the race, although I see him as the only serious competitor to Sanders. Warren might see it coming that she will stay below the 15% threshold for most primaries and chose to combine her campaign with Sanders'. However, she could also wait for Biden dropping out and may then have a chance.

[ Update after the New Hampshire primary. 538 now has "no one" as the most likely winner of the primary, but with Sanders as close second.

Sanders is the clear frontrunner in the current crowded field, but tends to get only a quarter of the vote. So it remains interesting what would happen when the field winnows. Some pundits simply add up all the other "moderate" candidates; that is not how it works.

A recent YouGov/Yahoo head to head poll of the main primary candidates suggests that Sanders would also win in that situation. Sanders would beat Klobuchar by 21 points, Bloomberg by 15 points, but also Biden by 4 points and Warren by 2 points. Also Warren would beat all the other candidates. Life-long Republican Bloomberg would loose against all other candidates. Biden got some hits, but of the "moderates" he is still the strongest competition against Sanders. ]

Nate Silver gives Sanders a chance of 46% of winning. Silver's model has an additional chance of 27% that no one will directly win a majority. If Sanders does have a clear plurality, it would be handing the presidency to Trump to nominate someone else. So also in case of a contested convention Sanders has a good chance of winning. Furthermore, polling for Sanders tends to go up in the weeks before primaries. That is the moment people start paying attention and talking to each other. So I feel my prediction of 60% chance of Sanders winning the primary is reasonable.

If we combine that with a 90% chance of winning the general election, the chance of stopping the class warfare against us is 54%. Let's hope for the best.

Variable Variability

Pages