I have to admit that I find Oreskes’ piece less than compelling and in many ways seriously wrong. Her main point is (or should be) twofold, namely that
1. if the cost of neglecting a possible risk is higher than that of protecting against it, then it may make sense to increase our chances of falsely believing that the risk is real when it is not, if that is necessary in order to reach an acceptable chance of identifying it when it is real, and
2. if we already have good reasons based on well-established theories to expect something is true, then we don’t need to demand the same level of direct evidence as we would if that evidence was our only reason for expecting the effect.
But Oreskes attempts to dress these (obvious?) claims in technical language about statistics and scientific practice – which she often garbles into either meaninglessness or outright error. For example she says “Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20.” So far as it is even parseable this is wrong in at least six ways. Scientists don’t ever “apply” a confidence “limit” but, when assessing the implications of evidence regarding the value of a parameter, they may use that evidence to construct the limits of a confidence interval by imposing, requiring, or perhaps “applying” a confidence level. This process of estimating a parameter has nothing to do with whether or not they will “accept” a causal claim, and the practice of determining the significance level of evidence with regard to a purported relationship is quite independent of the question of whether or not that relationship is “causal”. And the 95% significance level, or whatever other level is applied in that horrible terminology, is complementary not to the “odds” of the relationship , but to the predicted probability of the observed (or more “extreme”) data in a stochastic model in which the relationship is not included.
In fact the question of confidence intervals , ie the issue of whether or not it is appropriate to use the available data to place narrower bounds on our estimates of parameters (such as the expected change in annual rate of change of temperature per doubling of atmospheric CO2 from current levels), is largely irrelevant to the decisions we need to make with regard to ignoring or attempting to mitigate that effect, since whatever the extremes of what we consider likely, the middle of that range is what will govern our decisions. And the question of significance levels is irrelevant for two reasons. Firstly we already have plenty of data to rule out the hypothesis that temperature is fluctuating randomly without any long term trends – and to do so with a p-value of much less than 5%. Secondly (and much more importantly), the real null hypothesis is not random fluctuations around a constant. For more than a century we have known, with as much certainty as we can predict the orbit of a comet, that the Fourier-Arrhenius effect is pumping energy into the Earth’s surface at a rate which, absent unknown effects, would raise the surface temperature by between 2 and 4 degrees per doubling of atmospheric CO2. So the real question that we should be testing by data is whether or not there is evidence for an unknown higher order effect or some outside factor that mitigates the predicted warming. And since the warming could have very serious consequences, we should, if anything, require a higher significance level (ie lower p-value) before rejecting that null hypothesis.
But for all that, Schachtman’s criticism has its own weaknesses. He moves quickly to an ad hominem attempt to get the reader to dismiss Oreskes’ decision analysis by challenging her history. But in fact he is the one who gets it wrong! What Oreskes actually said is “The 95 percent confidence level is generally credited to the British statistician R. A. Fisher” and this is undoubtedly true, for even though Fisher was not the originator of confidence *intervals* , long before they were invented he did so much to popularize p=5% as an appropriate indicator of significance that our friend Briggs exemplifies the masses by saying “That rotten 95-percent ‘confidence’ came from Fisher… ” After that, Schachtman devotes a lot of attention to Oreskes reference to EPA’s use of a weakened standard (10% p-value or 90% confidence intervals) for early (1990’s) analysis of the effects of second-hand cigarette smoke. This appears to be a sore point for him, perhaps because of his 30 years of legal practice “focused on the defense of products liability suits, with an emphasis on the scientific and medico-legal issues that often dominate such cases”. But it has little to do with the climate issue.
What all these people all seem to share is a very limited view of what “science” is. Although statistical analysis of “noisy” data often plays a role, it is not true that our normal modus operandi is to assume that nothing ever happens unless we see direct evidence for it. Rather we have a whole interlocking body of mutually consistent experiments in a wide range of applied contexts which all support the same basic theoretical structure. Sometimes a situation is so complicated that we cannot predict its behaviour from the basic theory without making simplifying assumptions. In such cases we expect that factors we have neglected will impact the behaviour in ways we cannot predict and so which we “model” by adding small terms whose “random” values are drawn from a probability distribution. The questions of “significance”, “p-values”, and “confidence intervals” apply only to the question of whether our stochastic terms are adequate to effectively summarise all of the effects that have been left out of our analysis.