Law of small numbers
I remember when I was a child and went to school that almost everyone had a village to go to during holidays. Of course, they were other times and most of children’s parents had recently immigrated to the city, so almost everyone has “his village”. Now things are different. Most school children were born where they live, so it’s almost frowned upon to be a rube.
However, small towns have many interesting things. For example, they usually are most peaceful places to live and with a healthier lifestyle. But, although few people know, small towns are haunted by chance. Small towns are an easy prey for a thing called the law of small numbers. Do you know what I’m talking about?. We’ll try to explain it with an example.
When I worked as a resident there was a village, whose name I’ll not say not to offend anyone, from almost all patients with rare diseases came from. We, ignorant, even speculate on the possibility that the abundant slate around the city were radioactive and had the blame for the apparently high incidence of strange pathology among the inhabitants of this village. However, the explanation was much simpler and didn’t require any conspiracy theory. Blame was on small numbers.
We will assume that the risk of suffering fildulastrosis is one thousandth (prevalence PV = 0.001). As we all know, this genetic disease is caused by a mutation that occurs totally at random, so the situation of having or not having the disease can be assume as a Bernouilli’s event that follows a binomial probability distribution.
What the law of small numbers say
Let’s see what the law of small numbers say.
According to the prevalence that we have chosen, if we toured villages we expect to find a case of fildulastrosis per 1.000 inhabitants. If we get to a 5,000 inhabitants town and it has only one case instead of five, what would we say?. For sure we would think that we were beholding one more of the benefits of country life, much healthier, less stressful, and in contact with nature.
And what if we get to one even smaller, say 1,000 inhabitants, and we see that there are four sick people?. Following an as stupid as the above reasoning, we would think we were beholding one of the effects of country life, with less health resources and in contact with farm animals and other filthy stuff of wildlife.
But we would be wrong in both cases. Living in the countryside is not the cause of more or less people getting sick. Let’s see what happens in the these villages.
If there are 1,000 people we expect to find one case of fildulastrosis (Pv=0.001). In fact, if we use a binomial probability calculator, the probability of being at least one patient is 63%. But if we play around with the calculator, we can see that the probability of being two or more is 26%, that being three or more is 8% and that being four or more is 2%. You see, the prevalence triples in one in four 1,000 inhabitants village by chance alone.
Now consider that the city has 10,000 inhabitants. The expected number of cases is 10 (with a probability of 54%). However, the probability that there are at least 20 cases falls to 0.3% and the probability of 30 goes near to zero. This means that random is much more whimsical with small villages. Large samples are always more accurate and it is more difficult to find extreme values just by chance.
What about the other example?. It is the same: the small sample is less precise and more susceptible to drift towards extreme values just by chance. As the first village has 5,000 inhabitants, we expect to find at least five cases of fildulastrosis (probability 61%). If we use the calculator, we see that the probability that there are four or less is 44%, that there are three or less is 26%, and that there are two or less is 12%. It means that in one in eight 5,000 inhabitants village prevalence drops to 0.0004 just by chance.
What would happen with a larger village, say 10,000 inhabitants?. We will expect 10 cases or less with a probability of 58%, but the probability that prevalence dropping to 0.0004 (four cases or less) falls to 3%. And if you do the calculation for a 100,000 inhabitant’s city you’ll see that the probability that the prevalence lowers half is practically zero.
The law of small numbers is true in both ways. We will no longer have to give any absurd explanation when we find a small city with an abnormally high or low prevalence of a known disease. We´ll know it is due to the whim of chance and its law of small numbers.
And here we end up today. I hope no one has gone to Google to find what kind of disease fildulastrosis is, but if anyone has found it, please explain it to me. The example we set if very simple to make it easier to demonstrate the issue of imprecision in small samples. In real life, it is probable that onset of certain diseases may condition and increased risk of disease in the relatives, which could further exaggerate the effect we have shown towards the emergence of more extreme values. But that’s another story…