Archive for the ‘statistics’ Category

Mythical Myths #3 – The Concept of Race

Tuesday, February 15th, 2011

Oh damn! I had no particular wish to address this until browsing led me by chance to RACE – The Power of an Illusion at PBS where a bunch of well intentioned people are discrediting anti-racism by associating it with a poorly argued denial that a meaningful concept of race even exists.

It is indeed popular these days among those who don’t like the way that it has been used to assert that the concept of race does not correspond to anything scientifically definable and so is a “myth”, but this is really just wishful thinking and the idea that race is a myth is itself a myth, which makes race another example of what I identify as “Mythical Myths” (ie attempts to identify as myths things which really are real).

It is true that the concept of race may have little utility in human affairs, and whatever utility it does have may be more negative than positive, but it is silly to deny that it has any meaning at all. Whether desirable or not, it is a fact that most people can quickly and correctly identify the ancestral continent (and maybe even a much more specific territory) of a significant fraction of those they meet. This is because isolated populations over many generations do develop observable differences in appearance (and perhaps other factors as well). The fact that the classification of people into races is not complete or 100% reliable does not make it meaningless or undefinable. For example (just to make the point and without expectation that it will be useful for any other purpose) the following might be a reasonable “scientific” definition:

A race of strength s is a human population which has been sufficiently isolated for sufficiently long that (through either just random genetic drift or perhaps sexual selection or evolution in response to local environment) its members differ in their mean value of some computable combination of measurable characteristics from the global mean of non-members by more than 2s standard deviations.

(So if we use the criterion of guessing that a person is of a particular race of strength s if that person’s measurement of the relevant parameter is within s standard deviations of the racial mean, then for a race of strength 2, assuming normal distribution of the parameter, a randomly chosen non-member has only a 2.5% chance of being misidentified as a member of the race, and similarly for strength 1 the chance of misidentification of a non-member is about 16%).

Of course not everyone will have an identifiable race, and with reduced isolation it can be expected that the “strengths” of all races will decline over time, but I am sure that it will take at least several more generations before it is impossible to say with confidence of at least half of the people we meet that they have at least one ancestor within the past twenty generations who lived in Africa, Asia, or Europe. And it will be a very long time before we cannot identify for at least some individuals much more specific ancestral histories just on the basis of a quick visual inspection. In the meantime it may be socially harmful to pay much attention to these possibilities but it is foolish to deny that something is possible just because we don’t want people to do it.

[1] The above-linked PBS site attempts to justify the claim that “Race has no genetic basis” with the explanation “Not one characteristic, trait, or gene distinguishes all members of one so-called race from all members of another so-called race.” That this second statement is probably true does imply that no race is defined by the presence or absence of a single gene, but that is not the only possible genetic basis for a classification scheme. It may well be that our identification of a person’s race (when possible) is by reference to a combination of several characteristics – each of which may result from the activation of a multitude of genes and indeed the suggestion that a characteristic not linked definitively to a specific gene “has no genetic basis” is so simplistically wrong as to completely discredit its proponent.

[2] A “quiz” associated with the site includes the question “Which of the following is likely to be your ancestor?: (A)Nefertiti, (B)Julius Caesar, (C)Qin Shi Huang – first emperor of China, (D)All of the above, (E)None of the above.” with the answer given as (D) on the basis of a silly argument about numbers of ancestors which neglects the effect of isolation of populations.

Nassim Nicholas Taleb

Thursday, June 5th, 2008

I was led to this Sunday Times profile of Taleb via Arts&Letters Daily.

Taleb’s view that market collapses are more sudden and extreme (though less frequent) than rises seems believable to me but is presumably easily checked from the record, (and could presumably be built into the modelling of risk if true).

On a completely different tack I was taken with his statement that “Scientists don’t know what they are talking about when they talk about religion. Religion has nothing to do with belief, and I don’t believe it has any negative impact on people’s lives outside of intolerance.” Leaving aside the rather large scope of “intolerance” as a source of negative impact, I tend to agree that for most people the adoption of even a creed-based religion has little to do with actual belief and that this is why arguments about the validity of religion often fall flat without appearing to be processed.

Probability of Occurring by Chance

Tuesday, August 14th, 2007

In this post at squareCircleZ, Professor Bruce Armstrong from the Sydney Cancer Centre at the University of Sydney is quoted as saying “The probability the that increase is due simply to chance is about one in a million so we are looking at something that is almost certainly a real increase in risk”. But this is almost certainly a misstatement since the probability that something is due simply to chance is not computable and probably not even meaningful whereas the probability of its happening in a randomly chosen situation from a well defined population of cases is meaningful and often computable. Either concept could be expressed by the ambiguous title of this post but they are definitely NOT the same – as can be seen from the following example. If I win the lottery without cheating then the probability of it having happened by chance (in the sense of having only chance factors involved) is in fact 1 but the probability of it happening by chance (ie of it happening given that only chance factors were involved) was less than one in a million. Of course, if we don’t assume that I didn’t cheat, the probability that my win was due only to chance may be less, but in any event it is not the same as my chances of winning a fair game. For a more practically relevant example consider the case of an experiment which identifies an effect of some sort “at the 95% confidence level”. What this means is that the probability of the observation occurring if only random effects were present is no more than about 1 in 20. But then in a set of many trials it is likely that up to about 5% of them will actually appear to show the effect. Users of statistics need to be aware of this distinction since in an experiment which collects more than six variables (as many in the social sciences do) there are more than 21 pairs to consider and so in an average such experiment at least one such pair will seem to have a significant relationship even when no such relationship actually exists.

All this is actually relevant to the story about cancer clusters since, in a world with several million observed groups of a hundred or so people, if the chance of a cluster happening given only random factors is one in a million then we may expect to see several such clusters occurring just by chance.