## Archive for October, 2013

### Does “Falsifiable” Really Mean “Provably False”?

Wednesday, October 30th, 2013

In Why Falsifiability, Though Flawed, Is Alluring: Part I , William M. Briggs argues that “most theories scientists hold are not falsifiable” because “if the predictions derived from a theory are probabilistic then the theory can never be falsified. This is so even if the predictions have very, very small probabilities. If the prediction (given the theory) is that X will only happen with probability ε (for those less mathematically inclined, ε is as small as you like but always > 0);, and X happens, then the theory is not falsified. Period. Practically false is (as I like to say) logically equivalent to practically a virgin.

I think he’s right – at least if “falsifiable” means “provable to be false”. But I don’t think most scientists really demand scientific theories to be falsifiable in that sense. And many don’t even try to use that word any more; they are more inclined to use a less binding word like “testable”.

A theory might then be considered adequately testable if it can be used to predict that in some repeatable experiment there are outcomes of very low probability. If, after identifying such an outcome, we see it actually occur, then we say that the theory fails the test (though it could in principle still be true) and we reject it (ie strongly doubt it) – at least until the relative  frequency of failure events in repeated experiments falls to somewhere near their predicted probability.

### Big Tents – Inclusive or Confining?

Tuesday, October 29th, 2013

There has been a lot of heated reaction recently to a couple of incidents in which people have apparently denied the sincerity of others’ statements of belief.

Oprah Winfrey responded to atheist swimmer Diana Nyad’s expression of wonder and awe at the universe with “Well, I don’t call you an atheist then. I think if you believe in the awe and the wonder and the mystery, then that is what God is. That is what God is. It’s not a bearded guy in the sky.” And Richard Dawkins, in conversation with Bill Maher, declared that Barack Obama and Pope Francis must really be atheists.

Most of the commentary has been outrage expressed with less restraint than by Paul Brandeis Raushenbush in How Not to Talk About the Beliefs of Others where he at least acknowledges a positive aspect in both cases. “Oprah and Dawkins/Maher are being simultaneously arrogant and complimentary. Arrogant, in that they assume that anyone who has a similar world view as they do is secretly ‘one of them’; and complimentary, in that they are saying I admire you enough to claim you for my own belief system.”  And “What we can learn from these two vivid examples is that we all have the right to decide how to identify ourselves in terms of religion or lack thereof. It is not for others to affix their identity upon us, or strip ours from us.

But rather than interpret these two events as someone claiming to know the content of another’s mind better than they do themselves, it may be more charitable to interpret both as explaining that their own professed label is actually more inclusive than perceived by the other.

With this interpretation it is not denial or stripping of identity but just a clarification that the speaker’s own identity label is intended to be more encompassing than may have been thought.

The downside, which is there of course, is that defining one’s own view as a ‘Big Tent’ is often used as a strategy for discouraging self-identification with the alternative label. Oprah discouraging self-declared atheism, and Dawkins discouraging self-declared religious feeling, may not be denying the actual beliefs of the other but are threatening them with censure for their choice of label.. You may not be a bad kid, but if you dare to wear the wrong colours then you belong with the gang from the other side of the street.

### What Is Bjørn Lomborg Trying to Achieve?

Monday, October 28th, 2013

What Is Bjørn Lomborg Trying to Achieve? – asks Keith Kloor in his blog at DiscoverMagazine.com, and with reference to his recent profile in Cosmos.

Kloor says “it’s worth asking at this stage in his career if  Lomborg is a voice of reason, a professional pot stirrer, or a trollish ankle-biter. The answer probably depends on where you sit in these debates.” I suspect from his tone in Cosmos that Kloor sees it as maybe an 80,20,0 mix, but for my part I would put it as more like 30,50,20.

Maybe I should revisit the 2002 controversy in more detail to see if a deeper look would change my view.

Thursday, October 10th, 2013

Newcomb’s paradoxis the name usually given to the following problem. You are playing a game against another player, often called Omega, who claims to be omniscient; in particular, Omega claims to be able to predict how you will play in the game. Assume that Omega has convinced you in some way that it is, if not omniscient, at least remarkably accurate: for example, perhaps it has accurately predicted your behavior many times in the past.

Omega places before you two opaque boxes. Box A, it informs you, contains $1,000. Box B, it informs you, contains either$1,000,000 or nothing. You must decide whether to take only Box B or to take both Box A and Box B, with the following caveat: Omega filled Box B with \$1,000,000 if and only if it predicted that you would take only Box B.

What do you do?

(If you haven’t heard this problem before, give it some thought first and maybe read the post linked to above before going on to my own thoughts on the matter)

### So What *Is* Significant About P-Values?

Wednesday, October 9th, 2013

To follow up on my previous post:  since the occurrence of an event having low probability according to some model is not always reason to doubt the model, it becomes natural to ask what would be reason to doubt the model. And for this, some of the negative examples from last time may help to bring things into focus.

In the case of a continuously distributed variable (with absolutely continuous probability density) the probability of any particular value is zero, so whatever we observe is a case of something that had low probability in the model. (And the same applies in the case of a discrete variable with a large number of equally likely values.)

When we throw an icosahedral die, whatever face we see on top has a probability of only 1/20 of being there, but we don’t take that as evidence that the die is not a fair one. However, if someone specific had correctly predicted the result then we might be more suspicious – and that is the key to how p-values work. (By “someone specific” here I mean specified in advance – not the same as having 20 people place bets and being surprised to see one of them get it right.)

Similarly, in my silly example from last time, although a value very close to the predicted mean should not cause us to doubt that that predicted mean is correct, it may well cause us to doubt that the variance is as large as proposed in the model. (And in fact there are several historical examples where data clustered too close to the predicted mean has been taken as convincing evidence of experimental malfeasance.)

So in order to be made suspicious by a low p-value it seems to be important that we know in advance  what statistic we will be interested in and what kind of results we will consider significant.

This does not answer the question of exactly what that significance means or how we quantify it, but I think it does suggest that there is a valid intuition behind the idea that seeing something actually occur right after asking whether it will happen makes us doubt the claim that it actually had very low probability.

Now when I buy a lottery ticket I do wonder if I will actually win. So if I do win the jackpot I will be faced with something that to me would be a significant indication that there was more than chance involved. Of course in that case I will probably be wrong, but the probability of my being forced into that error is so small as to be of little worry to me.

Similarly, if I reject a null hypothesis model on seeing a pre-described outcome to which the model assigns a probability of p (whether it’s an extreme value of some statistic or just a value very close to some specific target) then if the hypothesis is actually true I have a probability p of being wrong.

That’s what the p-value really is. It’s the probability that the model predicts for whatever outcome we choose to specify in advance of checking the data. Period. If we decide to reject the model on seeing that outcome then we can expect to be wrong in the fraction p of cases where the model is true.

Of course if we just choose a low probability event at random we probably won’t see it and so will have nothing to conclude,  so it is important to pick as our test event something that we suspect will happen more frequently than the model predicts. (This doesn’t require that we necessarily have any specific alternative model in mind, but if we do then there may be more powerful methods of analysis which allow us to determine the relative likelihoods of the various models.)

Note: None of this tells us anything about the “probability that the model is true” or the “probability that our rejection is wrong” after the fact. (Noone but an extremely deluded appropriator of the label “Bayesian” would attempt to assign a probability to something that had already happened or which actually is either true or false.)

To repeat: What the p-value is is the frequency with which you can expect to be wrong (if you reject at level p) in cases where the null hypothesis is true. This is higher than the frequency with which you will actually be wrong among all the times you apply that rule, because the null hypothesis may not actually be true and none of those cases will count against you (since failure to reject something is not the same as actually accepting it and it is never factually wrong to withhold judgement – though I suppose it may often be morally wrong!).

P.S. Significance should not be confused with importance! Anyone who speaks English correctly should understand that significance refers to strength of signification – ie to the relative certainty of a conclusion – not to the importance of that conclusion. So it is possible to have a highly significant indication of a very small or unimportant effect and estimating the “size” of whatever effect is confounding the null hypothesis is something that cannot be done with the p-value alone.

P.P.S. There is of course a significant quite large non-zero probability that a randomly chosen pronouncement from my repertoire may be flawed. So if you find something to object to here you could get lucky.

UPDATE7:45pm Oct22: See the end of my long comment below re the common practice of computing a “p-value” after the fact from an actual observation rather than from a target event specified in advance.

### What’s Wrong With P-Values?

Tuesday, October 8th, 2013

One of my favourite betes noires claims to have put everything wrong with P-Values under one roof.

My response started with  “There’s nothing wrong with p-values any more than with Popeye. They is what they is and that’s that. To blame them for their own abuse is just a pale version of blaming any other victim.”

Briggs replied saying “This odd because there are several proofs showing there just are many things wrong with them. Particularly that their use is always fallacious.” which is odd itself as it seems to be just a reworking of exactly what I said, namely that what is “wrong” with them is just the (allegedly) fallacious uses that are made of them.

My comment continued with the following example:

“But if you are the kind of pervert who really enjoys abuse here goes:
Let H0 be the claim that z=N(0,1) and let r=1/z.
Then P(|r|>20)=P(|z|<.05)=approx.04<.05
So if z is within .05 of 0 then the p-value for r is less than .05 and so at the 95% confidence level we must reject the hypothesis that mean(z)=0.

Now the joke here is really based on Briggs mis-statement of what a p-value is. Not that there would be anything wrong with the thing he defined but it just wouldn’t be properly called a p-value. And in order to criticize something (or even just the use of that thing) you need to know what it actually is. So for the enlightenment of Mr Briggs, let me explore what a p-value actually is.

What Briggs defined as a p-value is as follows: “Given the model used and the test statistic dependent on that model and given the data seen and assuming the null hypothesis (tied to a parameter) is true, the p-value is the probability of seeing a test statistic larger (in absolute value) than the one actually seen if the experiment which generated the data were run an indefinite number of future times and where the milieu of the experiment is precisely the same except where it is “randomly” different.” This has a number of oddities (excessive and redundant uses of the word “given” and the inclusion of an inappropriate repetition condition being among them) but the most significant thing wrong with it is that it only applies to certain kinds of test statistic – as demonstrated by my silly example above.

A better definition might be: Given a stochastic model (which we call the null hypothesis) and a test statistic defined in terms of that model, the p-value of an observed value of that statistic is the probability in the model of having a value of the statistic which is further from the predicted mean than the observed value.

With this definition, it becomes clear that if the null hypothesis is true (ie if the model does accurately predict probabilities) then the occurrence of a low P-value implies the occurrence of an improbable event and so the logical disjunction that Briggs quotes from R A Fisher, namely “Either the null hypothesis is false, or the p-value has attained by chance an exceptionally low value” is indeed correct.

Briggs claim that this is “not a logical disjunction” is of course nonsense (any statement of the form “Either A or B” is a logical disjunction), and this one has the added virtue of being true. Of course  if the observed statistic has a low p-value then the disjunction is essentially tautological, but then  really so is anything else that we can be convinced of by logic.

But Briggs is right to wonder if it has any significance – or at least, if it does then what is the reason for that.

Why do we some people consider the occurrence of a low p-value to be significant (in the common language sense rather than just by definition)? In other words, why and how should it reduce our faith in the null hypothesis?

The first thing to note is that the disjunction  “Either the null hypothesis is false, or something very improbable has happened” should NOT actually do anything to reduce our faith in the null hypothesis. It certainly matters what kind of improbable thing we have seen happen.  For example a meteor strike destroying New York should not cause us to doubt the hypothesis that sex and gender are not correlated – so clearly the improbable observed thing must be something that is predicted to be improbable by the null hypothesis model.  But in fact, in any model with continuously distributed variables the occurrence of ANY particular exact observed value is an event of zero probability. One might hope to talk in such cases of the probability density instead, but the probability density can be changed just by re-scaling the variable, so that won’t do either.

What is it about the special case of a low p-value, ie an improbably large deviation from the expected value of a variable, that reduces our faith in the null hypothesis?

…to be continued

### Clean Slate? Asking Bjorn Lomborg To Help Figure Out ‘The Most Pressing Issue Facing’ America Is Like… | ThinkProgress

Monday, October 7th, 2013

one of the most pressing issues facing America is the media’s over-reliance on widely-debunked, non-credible sources, which poisons the atmosphere for a genuine discussion of our biggest problems and their best solutions

### Scorpion Stings Itself!

Saturday, October 5th, 2013

In Whos Afraid of Peer Review? ‘Science’ magazine publishes a purported bit of “research” into the failings of open access journals which starts by selecting those of less repute and fails to do any comparable study of the population (of traditional subscription-based journals) against which the open access journals are being compared. This has generated a lot of negative reaction, including a tongue-in-cheek suggestion that the writer who submitted to ‘Science’ his story of a “sting” on the open access journals was actually engaged in a sting on the ‘Science’ magazine itself.