A matter of pairs

We saw in the previous post how observational studies, in particular cohort studies and case-control studies, are full of traps and loopholes. One of these traps is the backdoor through which data may be eluding us, causing that we get erroneous estimates of association measures. This backdoor is kept ajar by confounding factors.

We know that there’re several ways to control confounding. One of them, pairing, has it peculiarities in accordance whether we employ it in a cohort study or in a case-control study.

When it comes to cohort studies, matching by the confounding factor allows us to obtain an adjusted measure of association. This is because we control the influence of the confounding variable on exposure and on the effect. However, the above is not fulfilled when the matching technique is used in a case-control study. The design of this type of study imposes the obligation to make the pairing once the effect has been produced. Thus, patients that act as controls are a set of independent individuals chosen at random, since each control is selected because it fulfills a series of criteria established by the case with which it is going to be paired. This, of course, prevents us to select other individuals in the population who do not meet the specified criteria but would be potentially included in the study. If we forget this little detail and apply the same methodology of analysis that would use a cohort study we would incur in a selection bias that would invalidate our results. In addition, although we force a similar distribution of the confounder, we only fully control its influence on the effect, but no on the exposure.

So the mentality of the analysis varies slightly when assessing the results of a case-control study in which we used the matching technique to control for confounding factors. While with an unpaired study we analyze the association between exposure and effect on the overall group, when we paired we must study the effect on the case-control pairs.

cyc_pairingWe will see this continuing with the example of the effect of tobacco on the occurrence of laryngeal carcinoma from the previous post.

In the upper table we see the global data of the study. If we analyze the data without considering that we used the pairing to select the controls we obtain an odds ratio of 2.18, as we saw in the previous post. However, we know that this estimate is wrong. What do we do? Consider the effect of couples, but only that of those that don’t get along.

We see in the table below the distribution of the pairs according to their exposure to tobacco. We have 208 pairs in which both the case (person with laryngeal cancer) and the control are smokers. Being both subject to exposure they don’t serve to estimate their association with the effect. The same is true of the 46 pairs in which neither the case nor the control smoke. Pairs of interest are the 14 in which the control smoke but the case don’t, and the 62 pairs in which only the case smokes, but not the control.

These discordant pairs are the ones that give us information on the effect of tobacco on the occurrence of laryngeal cancer. If we calculated the odds ratio is of 62/14 = 4.4, a measure of association stronger than the previously obtained and certainly much closer to reality.

Finally, I want to do three considerations before finishing. First, although it goes without saying, to remind you that the data are a product of my imagination and that the example is completely fictitious although it does not seem as stupid as others I invented in other posts. Second, these calculations are usually made with software, using the Mantel-Haenszel´s or the McNemar´s test. The third is to comment that in all these examples we have used a pairing ratio of 1: 1 (one control per case), but this need not necessarily be so because, in some cases, we may be interested in using more than one control for each case. This entails its differences about the influence of the confounder on the estimated measure of association, and its considerations when performing the analysis. But that’s another story…

Birds of a feather flock together

Sometimes, we cannot help confounding factors getting involved in our studies, both known and unknown. These confounding variables open a backdoor through which our data can slip, making those measures of association between exposure and effect that we estimate not to correspond to reality.

During the phase of analysis it is often used techniques as stratification or regression models to measure the association adjusted by the confounding variable. But we can also try to prevent confusion in the design phase. One way is to restrict the inclusion criteria in accordance with the confounding variable. Another strategy is to select controls to have the same distribution of confounding variable than the intervention group. This is what is known as pairing.

bird_feather_poblacion generalSuppose we want to determine the effect of smoking on the frequency of laryngeal cancer in a population distribution that you see in the first table. We can see that 80% of smokers are men, while only 20% of non-smokers are. We invented the risk of cancer in men is 2%, but rises to 6% for smokers. For its part, the risk in women is 1%, reaching 3% if they smoke. So, although all of them will double the risk if they practice the more antisocial of the vices, men always have twice the risk than women (while being equal the exposure to tobacco between the two sexes, because men smokers have six times more risk than non-smoker women). In short, sex acts as a confounding factor: influencing the likelihood of exposure and the likelihood of the effect, but is not part of the causal sequence between smoking and cancer of the larynx. This would be taken into account when analyzing and calculating the adjusted relative risk by Mantel-Haenszel technique or using a logistic regression model.

Let’s see another possibility, if we know the confounding factor, which is trying to prevent its effect during the planning phase of the study. Suppose we start from a cohort of 500 smokers, 80% out of them are men and 20% are women. Instead of randomly taking 500 non-smokers controls, we include in the unexposed cohort one non-smoker man per each smoker man in the exposed cohort, and the same with women. We will have two cohorts with a similar distribution of the confounding variable and, of course, also similar in the distribution of the remaining known variables (otherwise we could not compare them).

Have we solved the problem of confusion? Let’s check it out.

bird_feather_cohortes_tres tablasbird_feather_cohortesWe see the contingency table of our study with 1000 people, 80% men and 20% women in both groups exposed and unexposed. As we know the risk of developing cancer by gender and smoking status, we can calculate the number of people we expect to develop cancer during the study: 24 men smokers (6% of 400), eight non-smoking men (2% of 400), three women smokers (3% of 100) and one non-smoker women (1% of 100).

With these data we can build the contingency tables, global and stratified by gender, we expect to find at the end of follow-up. If we calculate the measure of association (in this case, the relative risk) in men and women separately we see that coincides (RR = 3). Plus, it’s the same risk as the global cohort, so it seems we have managed to close the back door. We know that in a cohort study, matching the confounding factor allows us to counteract its effect.

Now suppose that instead of a cohort study we conduct a case-control study. Can we use the pairing? Of course we can, who’s going to stop us? But there is one problem.

If we think about it, we realize that pairing with cohorts influences both the exposure and the effect. However, in case-control studies, forcing a similar distribution of confounding affects only its influence on the effect, not the one that has over exposure. This is so because homogenizing according to the confounder we also do it according to other related factors, among other, the exposition itself. For this reason, pairing doesn’t guarantee closing the back door in case-control studies.

bird_feather_casos controlesSomeone does not buy it? Let’s assume that we have 330 people with laryngeal cancer (80% male and 20% female). To do the study, we selected a group of similar controls from the same population (what is called a case-control study nested in a cohort study).

We know the number of expose and non-exposed from data we gave at the beginning of the general population, knowing the risk of cancer arising by gender and exposure to tobacco. In addition, we can also build the table of controls, since we know the percentage of exposure to tobacco by sex.

Finally, with the data from these three tables we can build the contingency tables for the overall study and those for men and women.

In this case, the suitable measure of association is the odds ratio, which has a value of three for men and women, but is 2.18 for the overall study population. Thus we see that they do bird_feather_casoscontroles_tres tablasnot match, which is telling us that we have not completely escaped from the effect of the confounder even though we used the technique of pairing to select the control group.

So pairing cannot be used in case-control studies? Yes, yes it can, although the analysis of the results to estimate the extent of adjusted association is a little different. But that is another story…

You can’t make a silk purse…

Propensity score

… of a sow’s ear. No, you can’t. As much as you try, it will remain a sow’s ear. And this is because the characteristics or defects of everyone cannot be avoided simply because one does external improvements. But, yes, it will look much more elegant.

In the world of biomedical studies in epidemiology there’s a type of design that doesn’t need to seem a silk purse. Of course, I’m talking about the king of kings, the randomized clinical trial, RCT for short.

Benefits of randomization

The RCT’s silk purse is randomization, which is nothing more than the unpredictable allocation of every trial participant to one of the alternative interventions, giving control to random so we cannot know which group will be assigned each participant to. Thus, it’s achieved that the characteristics of participants who can act as confounder or effect modifiers are equally distributed between the two intervention groups, so that if there’re differences between the groups under study we can say that the differences are due to the studied intervention, the only difference between the two groups.

On the other hand, observational studies lack of randomization, so we can never be sure that the observed differences are due to confounding variables that are even unknown to the researcher. Thus with cohort and case-control studies we cannot assert causality in the same way that can be established with the result of a RCT.

Multiples strategies have been invented to avoid this caveat of observational studies, such as stratification or logistic regression analysis, which allow estimating the effect of each variable on the outcome of the intervention in each group. We are going to talk now about one of these methods, the propensity score.

Let’s see if we can understand it with an example. Suppose we want to compare the duration of hospital admission of children with fildulastrosis according to the treatment they receive. We continue assuming that this terrible disease can be treated with pills or suppositories, being the preference of each doctor the criteria to choose one or another at the time of admission. We perform a retrospective study of the two cohorts and found that those who receive suppository are admitted five days longer on average than those receiving oral treatment. Can we conclude that the resolution is faster giving pills than suppositories? Because if we do so, we’ll run the risk of being wrong, because there may be other factors that we are not taking into account in addition to the treatment received.

In the case of a clinical trial, each participant has the same chance of receiving any of the treatments, so we can make a direct interpretation of the results. However, this is a cohort study, observational, and the risk of receiving pills or suppositories may depend on other factors. For example, a doctor may order suppositories to younger children, who are worse swallowing pills, while another doctor could not take into account this factor and give pills at all, because he prefers it. If age has nothing to do with the length of admission, we’ll be mixing the effect of treatment with the child’s age, comparing the suppositories of some of them (younger children) with the pills of the others (no age difference). And now think about one thing: if the probability of receiving either treatment varies in each participant, how are we to compare them without considering this chance? We have to compare those with a similar chance of receiving each treatment.

Propensity score

Well, here is where propensity score (PS) come into play, estimating the probability of each participant being given a treatment based on their characteristics.

PS is calculated using a logistic regression model with the intervention as the result and the covariates as predictors. Thus, an equation with each of the variables that we have included into the model because we think that they can act as confounding factors is obtained. For example, the probability of receiving the treatment to be equal to:

P(A) = β0 + β1a + β2b + β3c +….+ βnn,

Where P(A) is the probability of receiving A (actually, the model provides the natural logarithm of the odds ratio), the betas are the coefficients and a, b, c, …, n represent the model variables.

If we substitute the letters “a” to “n” by the characteristics of each participant, we get a score, which is the PS. And now we can compare with each other the participants of the two treatment arms with a similar score.

These comparisons can be done in several forms, being matching and stratification the simplest ones.

propensity scoreBy stratification, the participants are divided into groups with a range of scores and the groups are compared with each other to determine the effect of the intervention. By pairing, each participant of one group is compared to another having a score equal or, if it does not exist, similar (what is known as the nearest neighbor). In the figure you can see an example of pairing with the nearest neighbor of some of the participants in our fictitious study.

And this is what a PS is. A ploy to compare participants trying to avoid the effect of confounding variables and resemble the randomization of a RCT, becoming almost a quasi-experimental study. But as we had said, you can’t make a silk purse of a sow’s ear. For many variables we include into the regression model to calculate the PS, we can never be sure of having put all, as there may be confounding variables that we ignore. So it is always advisable to check the results of an observational study with the corresponding RCT.

We’re leaving…

And here we are done for today, although the PS go far more. For example, we talked only of matching and stratification, although there are more methods, more complex and less used in medicine, such as IP covariate adjustment or weighting by the inverse of the probability of receiving the intervention. But that is another story…

The backdoor

I wish I had a time machine!. Think about it for a moment. We should not have to work (we would have won the lottery several times), we could anticipate all our misfortunes, always making the best decision … It would be like in the movie “Groundhog Day”, but without acting the fool.

Of course if we had a time machine that worked, there would be occupations that could disappear. For example, epidemiologists would have a hard time. If we wanted to know, imagine, if the snuff is a risk factor for coronary heart disease we only would’ve to take a group of people, tell them not to smoke and see what happened twenty years later. Then we would go back in time, require them to smoke and see what happen twenty years later and compare the results of the two tests. How easy, isn’t it?. Who would need an epidemiologist and all his complex science about associations and study designs?. We could study the influence of exposure (the snuff) on the effect (coronary heart disease) comparing these two potential outcomes, also called counterfactual outcomes (pardon the barbarism).

However, not having a time machine, the reality is that we cannot measure the two results in one person, and although it seems obvious, what it actually means is that we cannot directly measure the effect of exposure to a particular person.

So epidemiologists resort to study populations. Normally in a population will be exposed and unexposed subjects, so we can try to estimate the counterfactual effect of each group to calculate what would be the average effect of exposure on the population as a whole. For example, the incidence of coronary heart disease in nonsmokers may serve to estimate what would have been the incidence of disease in smokers if they had not smoked. This enables that the difference in disease between the two groups (the difference between its factual outcomes), expressed as the applicable measure of association, is an estimate of the average effect of smoking on the incidence of coronary heart disease in the population.

All that we have said requires a prerequisite: counterfactual outcomes have to be interchangeable. In our case, this means that the incidence of disease in smokers if they had not smoked would have been the same as that of nonsmokers, who have never smoked. And vice versa: if the group of non-smokers had smoked they would have the same incidence than that observed in those who are actually smokers. This seems like another truism, but it’s not always the case, since in the relationship which exists between effect and exposure frequently exist backdoors that make counterfactual outcomes of the two groups not interchangeable, so the estimation of measures of association cannot be done properly. This backdoor is what we call a confounding factor o confounding variable.

backdoor_globalLet’s clarify a bit with a fictional example. In the first table I present the results of a cohort study (that I have just invented myself) that evaluates the effects of smoking on the incidence of coronary heart disease. The risk of disease is 0.36 (394/1090) among smokers and 0.34 (381/1127) among nonsmokers, so the relative risk (RR, the relevant measure of association in this case) is 0.36 / 0.34 = 1.05. I knew it!. As Woody Allen said in “Sleeper”!. The snuff is not as bad as previously thought. Tomorrow I go back to smoking.

Sure?. It turns out that mulling over the matter, it just occurs to me that something may be wrong. The sample is large, so it is unlikely that chance has played me a bad move. The study does not apparently have a substantial risk of bias, although you can never completely trust. So, assuming that Woody Allen wasn’t right in his film, there is only the possibility that there’s a confounding variable implicated altering our results.

The confounding variable must meet three requirements. First, it must be associated with exposure. Second, it must be associated with the effect of exposure independently of the exposure we are studying. Third, it should not be part of the chain of cause-effect relationship between exposure and effect.

This is where the imagination of researcher comes into play, which has to think what may act as a confounder. To me, in this case, the first that comes to mind is age. It fulfills the second point (the oldest are at increased risk of coronary heart disease) and third (no matter how the snuff is, it doesn’t increase your risk of getting sick because it makes you older). But, does it fulfill the first condition?. Is there an association between age and the fact of be a smoker?. It turns out that we had not thought about it before, but if this were so, it could explain everything. For example, if smokers were younger, the injurious effect of snuff could be offset by the “benefit” of younger age. Conversely, the benefit of the elderly for not smoking would vanish because of the increased risk of older age.

How can we prove this point?. Let’s separate the data of younger and older than 50 years and let’s recalculate the risk. If the relative risks are different, you will probably want to say that age is acting as a confounding variable. Conversely, if they are equal there will be no choice but to agree with Woody Allen.backdoor_edadesLet’s look at the table of the youngest. The risk of disease is 0.28 (166/591) in smokers and 0.11 (68/605) in non-smokers, then the RR is 2.5. Meanwhile the risk of disease, in patients older than 50 years, is 0.58 (227/387) in smokers and 0.49 (314/634) in nonsmokers, so the RR equals 1.18. Sorry for those of you who are smokers, but The Sleeper was wrong: the snuff is bad.

With this example we realize how important it is what we said before about counterfactual outcomes being interchangeable. If the age distribution is different between exposed and unexposed and we have the misfortune of that age is a confounding variable, the result observed in smokers will no longer be interchangeable with the counterfactual outcome of nonsmokers, and vice versa.

Can we avoid this effect?. We cannot avoid the effect of a confounding variable, and this is even a bigger problem when we don’t know that it can play its trick. Therefore it’s essential to take a number of precautions when designing the study to minimize the risk of its occurrence and having backdoors which data squeeze through.

One of these is randomization, with which we will try that both groups are similar in terms of the distribution of confounding variables, both those known and unknown. Another would be to restrict the inclusion in the study of a particular group as, for instance, those less than 50 years in our example. The problem is that we cannot do so for unknown confounders. Another third possibility is to use paired data, so that for every young smoker we include, we select another young non-smoker, and the same for the elderly. To apply this paired selection we also need to know beforehand the role of confounding variables.

And what do we do once we have finished the study and found to our horror that there is a backdoor?. First, do not despair. We can always use the multiple resources of epidemiology to calculate an adjusted measure of association which estimate the relationship between exposure and effect regardless of the confounding effect. In addition, there are several methods for doing the analysis, some simpler and some more complex, but all very stylish. But that’s another story…

Playing with powers

Numbers are a very peculiar creatures. It seems incredible sometimes what can be achieved by operating with some of them. You can even get other different numbers expressing different things. This is the case of the process by which we can take the values of a distribution and, from their arithmetic mean (a measure of centralization) calculate how apart from it the rest of the variables are and to raise the differences to successive powers to get measures of dispersion and even of symmetry. I know it seems impossible, but I swear it’s true. I’ve just read it in a pretty big book. I’ll tell you how…

Once we know what the arithmetic mean is, we can calculate the average separation of each value from it. We subtract the mean from each value and divide it by the total number of values (like calculating the arithmetic mean of the deviations of each value from the mean of the distribution). But there is one problem: as the mean is always in the middle (hence its name), the differences with the highest values (to be positive) will cancel out with that of the lowest values (which will be negative) and the result will always be zero. It is logical, and it is an intrinsic property of the mean, which is far from all the values the same average quantity. Since we cannot change the nature of the mean, what we can do is to calculate the absolute value of each difference before adding them. And so we calculate the mean deviation, which is the average of the deviations absolute values with respect to the arithmetic mean.power formulas

And here begins the game of powers. If we add the square differences instead of adding its absolute values we’ll come up with the variance, which is the average of the square deviations from the mean. We know that if we square-root the variance (recovering the original units of the variable) we get the standard deviation, which is the queen of the measures of dispersion.

And what if we raised the differences to the third power instead of square them?. Then we’ll get the average of the cube of the deviations of the values from the mean. If you think about it, you’ll realize that raising them to the cube we will not get rid of the negative signs. Thus, if there’s a predominance of lower values (the distribution is skewed to the left) the result will be negative and, if there is of higher values, will be positive (the distribution is skewed to the right). One last detail: to compare the symmetry index with other distributions we can standardize it dividing it by the cube of the standard deviation, according to the formula I write in the accompanying box. The truth is that to see it scares a little, but do not worry, any statistical software can do this and even worse things.

And as an example of anything worse, what if we raised the differences to the fourth power instead of to the third?. Then we’ll calculate the average of the fourth power of the deviations of the values from the mean. If you think about it for a second, you’ll quickly understand its usefulness. If all the values are very close to the mean, when multiplying by itself four times (raise to the fourth power) the result will be smaller than if the values are far from the mean. So, if there are many values near the mean (the distribution curve will be more pointed) the value will be lower than if the values are more dispersed. This parameter can be standardize dividing it by the fourth power of the standard deviation to get the kurtosis, which leads me to introduce three strange words more: a very sharp distribution is called leptokurtic, if it has extreme values scattered it’s called platykurtic and if it’s neither one thing nor the other, mesokurtic.

And what if we raise the differences to the fifth power?. Well, I don’t know what would happen. Fortunately, as far as I know, no one has jet thought about such a rudeness.

All these calculations of measures of central tendency, dispersion and symmetry may seem the delirium of someone with little work to do, but do not be deceived: they are very important, not only to properly summarize a distribution, but to determine the type of statistical test we use when we want to do a hypothesis contrast. But that’s another story…