One of my favourite betes noires claims to have put everything wrong with P-Values under one roof.
My response started with “There’s nothing wrong with p-values any more than with Popeye. They is what they is and that’s that. To blame them for their own abuse is just a pale version of blaming any other victim.”
Briggs replied saying “This odd because there are several proofs showing there just are many things wrong with them. Particularly that their use is always fallacious.” which is odd itself as it seems to be just a reworking of exactly what I said, namely that what is “wrong” with them is just the (allegedly) fallacious uses that are made of them.
My comment continued with the following example:
Now the joke here is really based on Briggs mis-statement of what a p-value is. Not that there would be anything wrong with the thing he defined but it just wouldn’t be properly called a p-value. And in order to criticize something (or even just the use of that thing) you need to know what it actually is. So for the enlightenment of Mr Briggs, let me explore what a p-value actually is.
What Briggs defined as a p-value is as follows: “Given the model used and the test statistic dependent on that model and given the data seen and assuming the null hypothesis (tied to a parameter) is true, the p-value is the probability of seeing a test statistic larger (in absolute value) than the one actually seen if the experiment which generated the data were run an indefinite number of future times and where the milieu of the experiment is precisely the same except where it is “randomly” different.” This has a number of oddities (excessive and redundant uses of the word “given” and the inclusion of an inappropriate repetition condition being among them) but the most significant thing wrong with it is that it only applies to certain kinds of test statistic – as demonstrated by my silly example above.
A better definition might be: Given a stochastic model (which we call the null hypothesis) and a test statistic defined in terms of that model, the p-value of an observed value of that statistic is the probability in the model of having a value of the statistic which is further from the predicted mean than the observed value.
With this definition, it becomes clear that if the null hypothesis is true (ie if the model does accurately predict probabilities) then the occurrence of a low P-value implies the occurrence of an improbable event and so the logical disjunction that Briggs quotes from R A Fisher, namely “Either the null hypothesis is false, or the p-value has attained by chance an exceptionally low value” is indeed correct.
Briggs claim that this is “not a logical disjunction” is of course nonsense (any statement of the form “Either A or B” is a logical disjunction), and this one has the added virtue of being true. Of course if the observed statistic has a low p-value then the disjunction is essentially tautological, but then really so is anything else that we can be convinced of by logic.
But Briggs is right to wonder if it has any significance – or at least, if it does then what is the reason for that.
we some people consider the occurrence of a low p-value to be significant (in the common language sense rather than just by definition)? In other words, why and how should it reduce our faith in the null hypothesis?
The first thing to note is that the disjunction “Either the null hypothesis is false, or something very improbable has happened” should NOT actually do anything to reduce our faith in the null hypothesis. It certainly matters what kind of improbable thing we have seen happen. For example a meteor strike destroying New York should not cause us to doubt the hypothesis that sex and gender are not correlated – so clearly the improbable observed thing must be something that is predicted to be improbable by the null hypothesis model. But in fact, in any model with continuously distributed variables the occurrence of ANY particular exact observed value is an event of zero probability. One might hope to talk in such cases of the probability density instead, but the probability density can be changed just by re-scaling the variable, so that won’t do either.
What is it about the special case of a low p-value, ie an improbably large deviation from the expected value of a variable, that reduces our faith in the null hypothesis?
…to be continued