# Probability calculation when denominator equals zero

I have a brother-in-law who is increasingly afraid of getting on a plane. He is able to make road trips for several days in a row so as not to take off the ground. But it turns out that the poor guy has no choice but to make a transcontinental trip and he has no choice but to take a plane to travel.

But at the same time, my brother-in-law, in addition to being fearful, is an occurrence person. He has been counting the number of flights of the different airlines and the number of accidents that each one has had in order to calculate the probability of having a mishap with each of them and fly with the safest. The matter is very simple if we remember that of probability equals to favorable cases divided by possible cases.

And it turns out that he is happy because there is a company that has made 1500 flights and has never had any accidents, then the probability of having an accident flying on their planes will be, according to my brother-in-law, 0/1500 = 0. He is now so calm that he almost has lost his fear to fly. Mathematically, it is almost certain that nothing will happen to him. What do you think about my brother-in-law?

Many of you will already be thinking that using brothers-in-law for these examples has these problems. We all know how brothers-in-law are… But don’t be unfair to them. As the famous humorist Joaquín Reyes says, “we all of us are brothers-in-law”, so just remember it. Of which there is no doubt, is that we will all agree with the statement that my brother-in-law is wrong: the fact that there has not been any mishap in the 1500 flights does not guarantee that the next one will not fall. In other words, even if the numerator of the proportion is zero, if we estimate the real risk it would be incorrect to keep zero as a result.

This situation occurs with some frequency in Biomedicine research studies. To leave airlines and aerophobics alone, think that we have a new drug with which we want to prevent this terrible disease that is fildulastrosis. We take 150 healthy people and give them the antifildulin for 1 year and, after this follow-up period, we do not detect any new cases of disease. Can we conclude then that the treatment prevents the development of the disease with absolute certainty? Obviously not. Let’s think about it a little.

## Probability calculation when denominator equals zero

Making inferences about probabilities when the numerator of the proportion is zero can be somewhat tricky, since we tend to think that the non-occurrence of events is something qualitatively different from the occurrence of one, few or many events, and this is not really so. A numerator equal to zero does not mean that the risk is zero, nor does it prevent us from making inferences about the size of the risk, since we can apply the same statistical principles as to non-zero numerators.

Returning to our example, suppose that the incidence of fildulastrosis in the general population is 3 cases per 2000 people per year (1.5 per thousand, 0.15% or 0.0015). Can we infer with our experiment if taking antifildulin increases, decreases or does not modify the risk of suffering fildulastrosis? Following the familiar adage, yes, we can.

We will continue our habit of considering the null hypothesis as of equal effect, so that the risk of disease is not modified by the new treatment. Thus, the risk of each of the 150 participants becoming ill throughout the study will be 0.0015. In other words, the risk of not getting sick will be 1-0.0015 = 0.9985. What will be the probability that none will get sick during the year of the study? Since there are 150 independent events, the probability that 150 subjects do not get sick will be 0.98985150 = 0.8. We see, therefore, that although the risk is the same as that of the general population, with this number of patients we have an 80% chance of not detecting any event (fildulastrosis) during the study, so it would be more surprising to find a patient who the fact of not having any. But the most surprising thing is that we are, thus, getting the probability that we do not have any sick in our sample: the probability that there is no sick is not 0 (0/150), as my brother-in-law thinks, but 80 %!

And the worst part is that, given this result, pessimism invades us: it is even possible that the risk of disease with the new drug is greater and we are not detecting it. Let’s assume that the risk with medication is 1% (compared to 0.15% of the general population). The risk of none being sick would be (1-0.01)150 = 0.22. Even with a 2% risk, the risk of not getting any disease is (1-0.02)150 = 0.048. Remember that 5% is the value that we usually adopt as a “safe” limit to reject the null hypothesis without making a type 1 error.

At this point, we can ask ourselves if we are very unfortunate and have not been lucky enough to detect cases of illness when the risk is high or, on the contrary, that we are not so unfortunate and, in reality, the risk must be low. To clarify ourselves, we can return to our usual 5% confidence limit and see with what risk of getting sick with the treatment we have at least a 5% chance of detecting a patient:

– Risk of 1.5/1000: (1-0.0015)150 = 0.8.

– Risk of 1/1000: (1-0.001)150 = 0.86.

– Risk of 1/200: (1-0.005)150 = 0.47.

– Risk of 1/100: (1-0.01)150 = 0.22.

– Risk of 1/50: (1-0.02)150 = 0.048.

– Risk of 1/25: (1-0.04)150 = 0.002.

As we see in the previous series, our “security” range of 5% is reached when the risk is below 1/50 (2% or 0.02). This means that, with a 5% probability of being wrong, the risk of fildulastrosis taking antifuldulin is equal to or less than 2%. In other words, the 95% confidence interval of our estimate would range from 0 to 0.02 (and not 0, if we calculate the probability in a simplistic way).

To prevent our reheated neurons from eventually melting, let’s see a simpler way to automate this process. For this we use what is known as the rule of 3. If we do the study with n patients and none present the event, we can affirm that the probability of the event is not zero, but less than or equal to 3/n. In our example, 3/150 = 0.02, the probability we calculate with the laborious method above. We will arrive at this rule after solving the equation we use with the previous method:

(1 – maximum risk) n = 0.05

First, we rewrite it:

1 – maximum risk = 0.051/n

If n is greater than 30, 0.051/n approximates (n-3)/n, which is the same as 1-(3/n). In this way, we can rewrite the equation as:

1- maximum risk = 1 – (3/n)

With which we can solve the equation and get the final rule:

Maximum risk = 3/n.

You have seen that we have considered that n is greater than 30. This is because, below 30, the rule tends to overestimate the risk slightly, which we will have to take into account if we use it with reduced samples.

## We’re leaving…

And with this we will end this post with some considerations. First, and as is easy to imagine, statistical programs calculate risk’s confidence intervals without much effort even if the numerator is zero. Similarly, it can also be done manually and much more elegantly by resorting to the Poisson probability distribution, although the result is similar to that obtained with the rule of 3.

Second, what happens if the numerator is not 0 but a small number? Can a similar rule be applied? The answer, again, is yes. Although there is no general rule, extensions of the rule have been developed for a number of events up to 4. But that’s another story…

## The one with the foreign name

Do you like to play? I’m talking about gambling and people going to casinos with the vain hope of winning a little (or no so little) money while having fun. But people who’d like to get rich in a quick and funny way forget two things. First is that everything they can see around them (and much more that they don’t see) has been paid by the thousands who previously failed in a similar attempt at the same place. Second, they forget to previously thoroughly study what are their chances of winning… and their odds.

You may wonder what an odds is. Well, to answer this question we have to warm up a few neurons.

We all understand the concept of probability. If someone ask what is the probability of getting a six when rolling a die in the casino, we’ll quickly respond that the probability is one in six or one sixth (0.16 or 16.66%). But the gambler may be interested in knowing how much more likely is to get six than not to get it. And the answer to that is not 1/6, but one fifth. Why? Because the probability of getting six is 1/6 and that of getting otherwise is 5/6. To find out how much more likely is to get six we have to divide 1/6 by 5/6, which will give us one fifth (20%). This is the odds: the probability of an event to occur respect to the probability of not to occur. For those who love formulas, odds = p / (1-p).

Let’s leave the casino for now. I have noticed that those nights that I take a look at the news on the Internet before going to bed I sleep worse. Suppose we take a survey asking people we run into on the street if they sleep well and if the use to watch the news before going to bed and we come up with the results that I show in the table.

We may ask what is the probability that someone who is sleepless usually read the news? Easy to answer: 25/79 or 0.31 (number of sleepless readers divided by number of readers). Moreover, what are the odds of being a sleepless reader? Also simple: the number of sleepless reader divided by the number of sleepless, or 25/54 or 0.46.

We also calculate the probability that a non-reader being sleepless as the quotient 105/355 = 0.29 (non-sleepless non-readers divided by total non-readers). The odds, meanwhile, would be 105/250 = 0.42 (sleepless non-readers divided by non-sleepless non-readers).

If we now calculate the ratio of the two probabilities we’ll get the relative risk, RR = 0.31 / 0.29 = 1.06. This means that the risk of having insomnia is more or less the same among those who read the news and those who do not. If we calculate the ratio of the two odds we’ll get a value of 1.09 (0.46/0.42). This is call the odds ratio (OR), an interesting parameter whose utility we’ll soon see.

Let’s now look again to the data in the table, but this time in reverse. What is the probability that an insomniac read the news: 25/130 = 0.19. What are the odds of insomniacs reading respect to non-reading the news: 25/105 = 0.23. What is the probability that you don’t be sleepless but be a reader: 54/304 = 0.17. And the odds: 54/250 = 0.21.

If we calculate the RR = 0.19/0.17 = 1.11, we’ll say that insomniacs have about the same risk of having read the news before going to bed than those who sleep peacefully. What about the odds? The odds is 0.23/0.58 = 1.09. What a surprise! The OR value is the same with independence of the way we manage the data, which must not be a coincidence, but must hide some meaning.

And this is because the OR measures the strength of the association between the effect (insomnia) and the exposure (to read the news). Its value is always the same even if we change the orders of the ratios in the table.

As with other parameters, the correct way is to calculate confidence intervals to know the accuracy of the estimate. In addition, this association will be statistically significant if the interval does not include the value of one, which is the null value for the OR. The greater the OR the greater is the strength of the association. An OR less than one is more complex to interpret, but we can do a similar reasoning we did when RR was less than one. But here end the similarities between them. To use RR we need to know the incidence of the effects in the two populations in comparison, while the OR is calculated based on the observed frequency in the two, so they are not comparable parameters although their interpretation is similar. They tend to be equivalent only when the effect has a very low frequency of occurrence. For these reason, the OR is the measure of association used in case-control studies and meta-analysis, whereas RR are preferable for cohort studies and clinical trials.

Just a couple of considerations before finishing the issue of OR. First, although it allows us to compare the association between two qualitative variables (categorized as yes or no), it doesn’t serve to establish a cause and effect relationship between them. Second, it’s useful because it allows evaluating the effect of other variables on the association, which has a role in the realization of logistic regression studies. But that’s another story…

## Give me a bar and I’ll move the earth

But I don’t accept any bar. It must be a very special bar. Or rather, a series of bars. And I’m not thinking about a bars chart, those so well-know and used that PowerPoint makes them almost without you asking for it. No, these graphs are very dull; they just represent how many times it repeats each of the values of a qualitative variable, but tell us nothing more.

I’m thinking about a much meaningful plot. I’m thinking about a histogram. Wow, you’ll say, but isn’t it another kind of bar chart?. Yes, but it has a different kind of bars, much more informative. To begin with, the histogram is used (or it should be) to represent frequencies of continuous quantitative variables. The histogram is not just a bar graph, but a frequency distribution. What does that mean?. Well, at the bottom, the bars are somewhat artificial. Let’s suppose a continuous quantitative variable such as weight. Imagine that our distribution ranges from 38 to 118 kg of weight. In theory, we can have infinite weight values (as with any continuous variable), but to represent the distribution we divide the range into an arbitrary number of intervals and draw a bar for each interval so that the height of the bar (and therefore its surface) is proportional to the number of cases inside the interval. This is a histogram: a frequency distribution.

Now, suppose we make the intervals more and more narrow. The profile formed by the bars is increasingly looking like a curve as intervals narrow. In the end, what we’ll come up with is a curve, which will be called the probability density curve. The probability of a given value will be zero (one would think that it should be the height of the curve at that point, but it is not other than zero), but the probability of the values of a give interval is equivalent to the surface area under the curve in that interval. And what will be the area under the entire curve?. Very easy: the probability of finding any of the possible values, i.e., one (100% if you like percentages).

As you see, the histogram is much more than what it seems at first sight. It tells us that the probability of finding a value lower than the mean is 0.5, but not only that, because we can calculate the probability density of any value using a tiny formula that I prefer not to show you to avoid you closing your browsers and stopping reading this post. Moreover, there’s a simpler way to find it out.

With variables following a normal distribution (the famous bell) the solution is simple. We know that a normal distribution is perfectly characterized by its mean and standard deviation. The problem is that each normal curve has its own distribution, so the probability density curve is specific to each distribution. What can we do?. We can invent a standard normal distribution whose mean is zero and whose standard deviation is one and we can study its probability density so that we need neither formulas nor tables to know the probability of a given segment.

Once done, we take any value of our distribution and transform it into its soul mate in the standard distribution. This process is called standardization and is as simple as subtracting the mean from the value and dividing the result by the standard deviation. Thus we obtain another statistic that physicians in general, and particularly statisticians, venerate the most: the z score.

The probability density of the standard distribution is well known. A z-value of zero is in the mean. The range of z = 0 ± 1.64 comprises 90% of the distribution; the rage of z = 0 ± 1.96 includes 95%; and z = 0 ± 2.58, 99%. What we do in practice is to choose the desirable standardized z value for our variable. This value is typically set at ±1 or ±2, according to the variable measured. Moreover, we can compare how the z-score is modified in successive determinations.

The problem arises because in medicine there’re many variables whose distribution is skewed and does not fit a normal curve, such as the height, blood cholesterol, and many others. But do not despair, mathematicians have invented a stuff called the central limit theorem, which says that if the sample size is large enough we can standardize any distribution and work with it as it fit the standard normal distribution. This theorem is a great thing because it allows standardizing even non-continuous variables that fit other distributions like the binomial, Poisson, or other.

But all this does not ends here. Standardization is the basis for calculating other features of the distribution such as the asymmetry index and kurtosis, and it is also the basis for many hypothesis contrasts seeking a known distribution to calculate statistical significance. But that’s another story…