The imperfect screening

Nobody is perfect. It is a fact. And a relief too. Because the problem is not to be imperfect, it is inevitable. The real problem is to believe one being perfect, to be ignorant of one’s limitations. And the same goes for many other things, such as diagnostic tests used in medicine.

But this is a real crime with diagnostic tools because, beyond its imperfection, it is possible to misclassify healthy and sick people. Don’t you believe me?. Let’s make some reflections.

Venn_Dco_enTo begin with, take a look at the Venn’s diagram I have drawn. What childhood memories these diagrams bring to me!. The filled square symbolizes our population in question. Up the diagonal are the sick (SCK) and down it the healthy (HLT), so that each area represents the probability of being SCK or HLT. The area of the square, obviously, equals 1: we can be certain that anybody will be healthy or sick, two mutually excluding situations. The ellipse encompasses the subjects undergoing the diagnostic test and getting a positive result (POS). In a perfect world, the entire ellipse would be above the diagonal, but in the real imperfect world the ellipse is crossed by the diagonal, so the results can be true POS (TP) or false (FP), the latter when are obtained in healthy. The area outside the ellipse would be the negatives (NEG), which, as you can see, are also divided into true and false (TN, FN).

Now let’s transfer this to the typical contingency table to define the probabilities of different options and think about a situation where we still have not carried out the test. In this case, the columns condition the probabilities of the events of the rows. For example, the upper left box represents the probability of POS in the SCK (once you are sick, how likely you are to get a positive result?), which we call the sensitivity (SEN). For its part, the lower right represents the probability of a NEG in a HLT, which we call specificity (SPE). The total of the first column represents the probability of being sick, which is nothing more than the prevalence (PRV), and so we can discern what the significance of the probability of each cell is. This table provides two features of the test, SEN and SPE, which, as we know, are intrinsic characteristics of the test whenever it is performed under similar conditions, even though if the populations are different.

PROB_PRE_POS_ENAnd what about the contingency table once you have carried out the test?. A subtle, but very important, change has taken place: now the rows condition the probabilities of the events of the columns. The total of the table do not change but do look now at the first cell, that represents the probability of being SCK given that the result has been POS (when positive, what is the probability of being sick?). And this is no longer the SEN, but the positive predictive value (PPV). The same applies to the lower right cell, which now represents the probability of being HLT given that the result has been NEG: the negative predictive value (NPV).

So we see that before performing the test we can usually will know its SEN and SPE, while once perform the test we can calculate its positive and negative predictive values, remaining these four test’s characteristics linked through the magic of Bayes’ theorem. Of course, regarding PPV and NPV there’s a fifth element to take into account: the prevalence. We know that predictive values vary depending on the PRV of the disease in the population, while SEN and SPE remain unchanged.

cribado_imperfecto_enAnd all this has its practical expression. Let’s invent an example to messing around a bit more. Suppose we have a population of one million inhabitants in which we conduct a screening for fildulastrosis. We know from previous studies that the test SEN is 0.66 and SPE is 0.96, and the prevalence of fildulastrosis is 0.0001 (1 in 10,000); a rare disease that I would advise you not to bother to look for it, if anyone has thought about it.

Knowing the PRV is easy to calculate that in our country there are 100 SCK. Of these, 66 will be POS (SEN = 0.66) and 34 will be NEG. Moreover, there will be 990,900 healthy, of which 96% (959904) will be NEG (SPE = 0.96) and the rest (39,996) will be POS. In short, we’ll get 40,062 POS, of which 39,996 will be FP. No one feel scared about the high number of false positives. This is because we have chosen a very rare disease, so there are many FP even though the SPE is quite high. Consider that in real life, we’d need to do the confirmatory test to all these subjects to finish confirming the diagnosis only in 66 people. Therefore, it’s very important to think well if the screening is worth doing before starting to look for the disease in the population. For this and many other reasons.

We can now calculate the predictive values. PPV is the ratio between true and the total of POS: 66/40062 = 0.0016. So, there will be one sick in 1,500 positive, more or less. Similarly, the NPV is the ratio between true and the total of NEG: 959904/959938 = 0.99. As expected, given the high SPE of the test, to get a negative result makes it highly improbable to be sick.

What do you think? Is it a useful test for mass screening with such a number of false positives and a PPV of 0.0016?. Well, while it may seem counterintuitive, if we think about it for a moment, it’s not so bad. The pretest probability of being SCK is 0.0001 (PRV). The posttest probability is 0.0016 (PPV). So, their ratio has a value of 0.0016/0.0001 = 16, which means we have multiplied by 16 our ability to detect the sick. Therefore, the test doesn’t seem so bad, but we must take into account many other factors before starting to screen.

All this we have seen so far has an additional practical application. Suppose you only know SEN and SPE, but we don’t know the PRV of the disease in the population that we have screened. Can we be estimated it from the results of the screening?. The answer is, of course, yes.

Imagine again our population of one million subjects. We do the test and get 40,062 positive. The problem here is that some of these (the most) are FP. Also, we don’t know how many patients have tested negative (FN). How can we get then the number of sick people?. Let’s think about it for a while.

We have said that the number of patients will be equal to the number of POS minus the number of FP and plus the number of FN:

Nº sick = Total POS – Nº FP + Nº FN

We have the number of POS: 40,062. The FP will be those healthy (1-PRV) who get positive being healthy (or the healthy that doesn’t get NEG: 1-SPE). Then, the total number of FP will be:

FP = (1-PRV)(1-SPE) x n (1 million, the population’s size)

Finally, FN will be sick people (PRV) which don’t get a positive (SEN-1). Then, the total number of FN is:

FN = PRV x (1-SEN) x n (1 million, the population’s size)

If we substitute the total of FP and FN in the first equation with the values we’ve just derived, we can get the PRV, obtaining the following formula:

PRV= \frac{\frac{POS}{n}-(1-SPE)}{SEN - (1-SPE)}

We can now calculate the prevalence in our population:

PRV= \frac{\frac{POS}{n}-(1-SPE)}{SEN - (1-SPE)}PRV= \frac{\frac{40.062}{1.000.000}-(1-0,96)}{0,66 - (1-0,96)}= \frac{0,040062 - 0,04}{0,66 -0,04}= \frac{0,000062}{0,062}= 0,0001 (1 \ per\ 10.000)

Well, I think one of my lobes has just melted down, so we’ll have to leave it there. Once again, we’ve seen the magic and power of number and how to make that the imperfections of our tools work in our favor. We could even go a step further and calculate the accuracy of the estimate we’ve done. But that’s another story…

About chalk and cheese

Many times we find things that people insist on mixing and confusing though they are clearly different. This is when we often resort to the saying that they “are like chalk and cheese”, which actually means they are clearly distinct.

Well, in epidemiology we have a clear example of chalk and cheese in the case of the type of most used frequency measures. I’m talking about the mess we form with the terms ratio, proportion and rate.

Although all of the three are different things, there’s much tendency to confuse with each other, and not only among rookies: there are examples in books of epidemiology of rates that are not so, of ratios that are proportions and anything we want to imagine.

Let’s see them one by one and we’ll see how they are actually like chalk and cheese.

Moving on, we will say that a ratio is the relative magnitude of two quantities of any two variables. It’s calculated by dividing one of the magnitudes (numerator) by the other (denominator), comparing the two in such a way. The key point of the ratio is that numerator and denominator don’t need to be related. Not even they have to be of the same quality of things. We can compare eggs with eggs or chestnuts with people who have an apartment in Albacete (forgive me if I cannot think of an example in which this comparison can be useful).

Ratios can be used for descriptive or analytical purposes. For descriptive purposes they compare men/women participating in a trial, or the ratio of cases and controls, etc. For analytical purposes they can be used to study disease between case and controls, mortality between two groups, etc. Typical examples of ratios are relative risk and odds ratio.

On the other hand, a proportion is a comparison of a part of something in relation to the whole, and can be expressed as a fraction, a decimal number or a percentage. By definition, the numerator must be included in the denominator. For example, the number of obese people who swear they eat little divide by the total number of obese give us the proportion of obese people swearing eating little (usually strikingly higher than expected). If we multiply it by a hundred, we get the percentage.

Proportions also represent the probability of an event to occur, so their values range from zero to one or from zero to one hundred if we use percentages. An example of proportion is the incidence, which represents the risk of getting the disease in a population at a given period of time.

A proportion can be converted into a ratio. You just have to subtract the numerator from the denominator a divide them again. For example, in a study in which 35 men and 25 women participate, the proportion of male participant would be 35/60 = 0.58. But if you want to know the ratio of males to females, it would be 35/(60-35) = 1.4.

The third concept in discord is rate. A rate is a measure of the frequency with which an event occurs in a specific population during a given period of time. Being based the measure of frequency on the size of the population, rates are very useful to compare event frequencies at different times, locations, etc., as well as among populations of different sizes.

Here I want to call your attention to the often misnamed prevalence rate. Prevalence measures the number of individuals in a population who have the disease at any given time. But if you think about it, the sick (numerator) are included in the denominator, so prevalence is actually a proportion and not a rate.

Examples of actual rates would be infant mortality, specific mortality, crude birth rate, etc.

And we’re done for today. I will not increase the confusion over that matter with other epidemiological measures with similar names. Because there are more, as the incidence proportion, incidence rate, etc. But that’s another story…

The table

There’re plenty of tables. And they play a great role throughout our lives. Perhaps the first one that strikes us during our early childhood is the multiplication table. Who doesn’t long, at least the older of us, how we used to repeat like parrots that of two times one equals two, two times… until we learned it by heart?. But, as soon as we achieved mastering multiplication tables we bumped into the periodic table of the elements.  Again to memorize, this time aided by idiotic and impossible mnemonics about some Indians who Gained Bore I-don’t-know-what.

But it was through the years that we found the worst table of all: the foods composition table, with its cells full of calories. This table pursues us even in our dreams. And it’s because eating a lot have many drawbacks, most of which are found out with the aid of other table: the contingency table.

Contingency tables are used very frequently in Epidemiology to analyze the relationship among two or more variables. They consist of rows and columns. Groups by level of exposure to the study factor are usually represented in the rows, while categories that have to do with the health problem that we are investigating are usually placed in the columns. Rows and columns intersect to form cells in which the frequency of its particular combination of variables is represented.

The most common table represents two variables (our beloved 2×2 table), one dependent and one independent, but this is not always true. There may be more than two variables and, sometimes, there may be no direction of dependence between variables before doing the analysis.

Simpler 2×2 tables allow analyzing the relationship between two dichotomous variables. According to the content and the design of the study to which they belong, their cells may have slightly different meanings, just as there will be different parameters that can be calculated from the data of the table.

contingencia_transversal_enThe first we’re going to talk about are cross-sectional studies’ tables. This type of study represents a sort of snapshot of our sample that allows us to study the relationship between the variables. They’re, therefore, prevalence studies and, although data can be collected over a period of time, the result only represents the snapshot we have already mentioned. Dependent variable is placed in columns (disease status) and independent variable in rows (exposure status), so we can calculate a series of frequency, association and statistical significance measures.

The frequency measures are the prevalence of disease among exposed (EXP) and unexposed (NEXP) and the prevalence of exposure among diseased (DIS) and non-diseased (NDIS). These prevalences represent the number of sick, healthy, exposed and unexposed in relation to each group total, so they are rates estimated in a precise moment.

The measures of association are the rates between prevalences just aforementioned according to exposure and disease status, and the odds ratio, which tells us how much more likely the disease will occur in exposed (EXP) versus non-exposed (NEXP) people. If these parameters have a value greater than one it will indicate that the exposure factor is a risk factor for disease. On the contrary, a value equal or greater than zero and less than one will mean a protective factor. And if the value equals one, it will be neither fish nor fowl.

Finally, as in all types of tables that we’ll mention, you can calculate statistical significance measures, mainly chi-square with or without correction, Fisher’s exact test and p value, unilateral or bilateral.

contingencia_casos_controles_enVery much like those table we’ve just seen are case-control studies’ tables. This study design tries to find out if different levels of exposure can explain different levels of disease. Cases and controls are placed in columns and exposure status (EXP and NEXP) in rows.

The measures of frequency that we can calculate are the proportion of exposed cases (based on the total number of cases) and the proportion of exposed controls (based on the total number of controls). Obviously, we can also come up with the proportions of non-exposed calculating the complementary values of the aforementioned ones.

The key measure of association is the odds ratio that we already know and in which we are not going to spend much time. All of us know that, in the simplest way, we can calculate its value as the ratio of the cross products of the table and that it informs us about how much more likely is the disease to occur in exposed than in non-exposed people. The other measure of association is the exposed attributable fraction (ExpAR), which indicates the number of patients who are sick due to direct effect of exposition.

Managing this type of tables, we can also calculate a measure of impact: the population attributable fraction (PopAR), which tells us what would happen on the population if we eliminated the exposure factor. If the exposure factor is a risk factor, the impact will be positive. Conversely, if we are dealing with a protective factor, its elimination impact will be negative.

With this type of study design, the statistical significance measures will be different if we are managing paired (McNemar test) or un-paired data (chi-square, Fisher’s exact test and p value).

contingencia_cohortes_acumulada_enThe third type of contingency tables is the corresponding to cohort studies, although their structure differ slightly if you count total cases along the entire period of the study (cumulative incidence) or if you consider the time period of the study, the time of onset of disease in cases and the different time of follow-up among groups (incidence rate or incidence density).

Tables from cumulative incidence studies (CI) are similar to those we have seen so far. Disease status is represented in columns and exposure status in rows. Otherwise, incidence density (ID) tables represent in the first column the number of patients and, in the second column, the follow-up in patients-years format, so that those with longer follow-up have greater weight when calculating measures of frequency, association, etc.

contingencia_cohortes_densidad_enThe measures of frequency are the EXP risk (Re) and the NEXP risk (Ro) for CI studies and EXP and NEXP incidence rates in ID studies.

We can calculate the ratios of the above measures to come up with the association measures: relative risk (RR), absolute risk reduction (ARR) and relative risk reduction (RRR) for CI studies and incidence density reduction (IRD) for ID studies. In addition, we can also calculate ExpAR as we did in the cases-control study, as well as a measure of impact: PopAR.

We can also calculate the odds ratios if we want, but they are generally much less used in this type of study design. In any case, we know that RR and odds ratio are very similar when disease prevalence is low.

To end with this kind of table, we can calculate the statistical significance measures: chi-square, Fisher’s test and p value for CI studies and other association measures for ID studies.

As always, all these calculations can be done by hand, although I recommend you to use a calculator, such as the available one at the CASPe site. It’s easier and faster and further we will come up with all these parameters and their confidence intervals, so we can also estimate their precision.

And with this we come to the end. There’re more types of tables, with multiple levels for managing more than two variables, stratified according to different factors and so on. But that’s another story…