In this post at squareCircleZ, Professor Bruce Armstrong from the Sydney Cancer Centre at the University of Sydney is quoted as saying “The probability the that increase is due simply to chance is about one in a million so we are looking at something that is almost certainly a real increase in risk”. But this is almost certainly a misstatement since the probability that something is due simply to chance is not computable and probably not even meaningful whereas the probability of its happening in a randomly chosen situation from a well defined population of cases is meaningful and often computable. Either concept could be expressed by the ambiguous title of this post but they are definitely NOT the same – as can be seen from the following example. If I win the lottery without cheating then the probability of it having happened by chance (in the sense of having only chance factors involved) is in fact 1 but the probability of it happening by chance (ie of it happening given that only chance factors were involved) was less than one in a million. Of course, if we don’t assume that I didn’t cheat, the probability that my win was due only to chance may be less, but in any event it is not the same as my chances of winning a fair game. For a more practically relevant example consider the case of an experiment which identifies an effect of some sort “at the 95% confidence level”. What this means is that the probability of the observation occurring if only random effects were present is no more than about 1 in 20. But then in a set of many trials it is likely that up to about 5% of them will actually appear to show the effect. Users of statistics need to be aware of this distinction since in an experiment which collects more than six variables (as many in the social sciences do) there are more than 21 pairs to consider and so in an average such experiment at least one such pair will seem to have a significant relationship even when no such relationship actually exists.

All this is actually relevant to the story about cancer clusters since, in a world with several million observed groups of a hundred or so people, if the chance of a cluster happening given only random factors is one in a million then we may expect to see several such clusters occurring just by chance.