Probability with null numerator
I have a brother-in-law who is increasingly afraid of getting on a plane. He is able to make road trips for several days in a row so as not to take off the ground. But it turns out that the poor guy has no choice but to make a transcontinental trip and he has no choice but to take a plane to travel.
But at the same time, my brother-in-law, in addition to being fearful, is an occurrence person. He has been counting the number of flights of the different airlines and the number of accidents that each one has had in order to calculate the probability of having a mishap with each of them and fly with the safest. The matter is very simple if we remember that of probability equals to favorable cases divided by possible cases.
And it turns out that he is happy because there is a company that has made 1500 flights and has never had any accidents, then the probability of having an accident flying on their planes will be, according to my brother-in-law, 0/1500 = 0. He is now so calm that he almost has lost his fear to fly. Mathematically, it is almost certain that nothing will happen to him. What do you think about my brother-in-law?
Many of you will already be thinking that using brothers-in-law for these examples has these problems. We all know how brothers-in-law are… But don’t be unfair to them. As the famous humorist Joaquín Reyes says, “we all of us are brothers-in-law”, so just remember it. Of which there is no doubt, is that we will all agree with the statement that my brother-in-law is wrong: the fact that there has not been any mishap in the 1500 flights does not guarantee that the next one will not fall. In other words, even if the numerator of the proportion is zero, if we estimate the real risk it would be incorrect to keep zero as a result.
This situation occurs with some frequency in Biomedicine research studies. To leave airlines and aerophobics alone, think that we have a new drug with which we want to prevent this terrible disease that is fildulastrosis. We take 150 healthy people and give them the antifildulin for 1 year and, after this follow-up period, we do not detect any new cases of disease. Can we conclude then that the treatment prevents the development of the disease with absolute certainty? Obviously not. Let’s think about it a little.
Probability with null numerator
Making inferences about probabilities when the numerator of the proportion is zero can be somewhat tricky, since we tend to think that the non-occurrence of events is something qualitatively different from the occurrence of one, few or many events, and this is not really so. A numerator equal to zero does not mean that the risk is zero, nor does it prevent us from making inferences about the size of the risk, since we can apply the same statistical principles as to non-zero numerators.
Returning to our example, suppose that the incidence of fildulastrosis in the general population is 3 cases per 2000 people per year (1.5 per thousand, 0.15% or 0.0015). Can we infer with our experiment if taking antifildulin increases, decreases or does not modify the risk of suffering fildulastrosis? Following the familiar adage, yes, we can.
We will continue our habit of considering the null hypothesis as of equal effect, so that the risk of disease is not modified by the new treatment. Thus, the risk of each of the 150 participants becoming ill throughout the study will be 0.0015. In other words, the risk of not getting sick will be 1-0.0015 = 0.9985. What will be the probability that none will get sick during the year of the study? Since there are 150 independent events, the probability that 150 subjects do not get sick will be 0.98985150 = 0.8.
We see, therefore, that although the risk is the same as that of the general population, with this number of patients we have an 80% chance of not detecting any event (fildulastrosis) during the study, so it would be more surprising to find a patient who the fact of not having any. But the most surprising thing is that we are, thus, getting the probability that we do not have any sick in our sample: the probability that there is no sick is not 0 (0/150), as my brother-in-law thinks, but 80 %!
And the worst part is that, given this result, pessimism invades us: it is even possible that the risk of disease with the new drug is greater and we are not detecting it. Let’s assume that the risk with medication is 1% (compared to 0.15% of the general population). The risk of none being sick would be (1-0.01)150 = 0.22. Even with a 2% risk, the risk of not getting any disease is (1-0.02)150 = 0.048. Remember that 5% is the value that we usually adopt as a “safe” limit to reject the null hypothesis without making a type 1 error.
At this point, we can ask ourselves if we are very unfortunate and have not been lucky enough to detect cases of illness when the risk is high or, on the contrary, that we are not so unfortunate and, in reality, the risk must be low. To clarify ourselves, we can return to our usual 5% confidence limit and see with what risk of getting sick with the treatment we have at least a 5% chance of detecting a patient:
- Risk of 1.5/1000: (1-0.0015)150 = 0.8.
Risk of 1/1000: (1-0.001)150 = 0.86.
Risk of 1/200: (1-0.005)150 = 0.47.
Risk of 1/100: (1-0.01)150 = 0.22.
Risk of 1/50: (1-0.02)150 = 0.048.
Risk of 1/25: (1-0.04)150 = 0.002.
As we see in the previous series, our “security” range of 5% is reached when the risk is below 1/50 (2% or 0.02). This means that, with a 5% probability of being wrong, the risk of fildulastrosis taking antifuldulin is equal to or less than 2%. In other words, the 95% confidence interval of our estimate would range from 0 to 0.02 (and not 0, if we calculate the probability in a simplistic way).
To prevent our reheated neurons from eventually melting, let’s see a simpler way to automate this process. For this we use what is known as the rule of 3. If we do the study with n patients and none present the event, we can affirm that the probability of the event is not zero, but less than or equal to 3/n. In our example, 3/150 = 0.02, the probability we calculate with the laborious method above. We will arrive at this rule after solving the equation we use with the previous method:
(1 – maximum risk) n = 0.05
First, we rewrite it:
1 – maximum risk = 0.051/n
If n is greater than 30, 0.051/n approximates (n-3)/n, which is the same as 1-(3/n). In this way, we can rewrite the equation as:
1- maximum risk = 1 – (3/n)
With which we can solve the equation and get the final rule:
Maximum risk = 3/n.
You have seen that we have considered that n is greater than 30. This is because, below 30, the rule tends to overestimate the risk slightly, which we will have to take into account if we use it with reduced samples.
And with this we will end this post with some considerations. First, and as is easy to imagine, statistical programs calculate risk’s confidence intervals without much effort even if the numerator is zero. Similarly, it can also be done manually and much more elegantly by resorting to the Poisson probability distribution, although the result is similar to that obtained with the rule of 3.
Second, what happens if the numerator is not 0 but a small number? Can a similar rule be applied? The answer, again, is yes. Although there is no general rule, extensions of the rule have been developed for a number of events up to 4. But that’s another story…