… of a sow’s ear. No, you can’t. As much as you try, it will remain a sow’s ear. And this is because the characteristics or defects of everyone cannot be avoided simply because one does external improvements. But, yes, it will look much more elegant.

In the world of biomedical studies in epidemiology there’s a type of design that doesn’t need to seem a silk purse. Of course, I’m talking about the king of kings, the randomized clinical trial, RCT for short.

The RCT’s silk purse is randomization, which is nothing more than the unpredictable allocation of every trial participant to one of the alternative interventions, giving control to random so we cannot know which group will be assigned each participant to. Thus, it’s achieved that the characteristics of participants who can act as confounder or effect modifiers are equally distributed between the two intervention groups, so that if there’re differences between the groups under study we can say that the differences are due to the studied intervention, the only difference between the two groups.

On the other hand, observational studies lack of randomization, so we can never be sure that the observed differences are due to confounding variables that are even unknown to the researcher. Thus with cohort and case-control studies we cannot assert causality in the same way that can be established with the result of a RCT.

Multiples strategies have been invented to avoid this caveat of observational studies, such as stratification or logistic regression analysis, which allow estimating the effect of each variable on the outcome of the intervention in each group. We are going to talk now about one of these methods, the propensity score.

Let’s see if we can understand it with an example. Suppose we want to compare the duration of hospital admission of children with fildulastrosis according to the treatment they receive. We continue assuming that this terrible disease can be treated with pills or suppositories, being the preference of each doctor the criteria to choose one or another at the time of admission. We perform a retrospective study of the two cohorts and found that those who receive suppository are admitted five days longer on average than those receiving oral treatment. Can we conclude that the resolution is faster giving pills than suppositories? Because if we do so, we’ll run the risk of being wrong, because there may be other factors that we are not taking into account in addition to the treatment received.

In the case of a clinical trial, each participant has the same chance of receiving any of the treatments, so we can make a direct interpretation of the results. However, this is a cohort study, observational, and the risk of receiving pills or suppositories may depend on other factors. For example, a doctor may order suppositories to younger children, who are worse swallowing pills, while another doctor could not take into account this factor and give pills at all, because he prefers it. If age has nothing to do with the length of admission, we’ll be mixing the effect of treatment with the child’s age, comparing the suppositories of some of them (younger children) with the pills of the others (no age difference). And now think about one thing: if the probability of receiving either treatment varies in each participant, how are we to compare them without considering this chance? We have to compare those with a similar chance of receiving each treatment.

Well, here is where propensity score (PS) come into play, estimating the probability of each participant being given a treatment based on their characteristics.

PS is calculated using a logistic regression model with the intervention as the result and the covariates as predictors. Thus, an equation with each of the variables that we have included into the model because we think that they can act as confounding factors is obtained. For example, the probability of receiving the treatment to be equal to:

P(A) = β_{0} + β_{1}a + β_{2}b + β_{3}c +….+ β_{n}n,

Where P(A) is the probability of receiving A (actually, the model provides the natural logarithm of the odds ratio), the betas are the coefficients and a, b, c, …, n represent the model variables.

If we substitute the letters “a” to “n” by the characteristics of each participant, we get a score, which is the PS. And now we can compare with each other the participants of the two treatment arms with a similar score.

These comparisons can be done in several forms, being matching and stratification the simplest ones.

By stratification, the participants are divided into groups with a range of scores and the groups are compared with each other to determine the effect of the intervention. By pairing, each participant of one group is compared to another having a score equal or, if it does not exist, similar (what is known as the nearest neighbor). In the figure you can see an example of pairing with the nearest neighbor of some of the participants in our fictitious study.

And this is what a PS is. A ploy to compare participants trying to avoid the effect of confounding variables and resemble the randomization of a RCT, becoming almost a quasi-experimental study. But as we had said, you can’t make a silk purse of a sow’s ear. For many variables we include into the regression model to calculate the PS, we can never be sure of having put all, as there may be confounding variables that we ignore. So it is always advisable to check the results of an observational study with the corresponding RCT.

And here we are done for today, although the PS go far more. For example, we talked only of matching and stratification, although there are more methods, more complex and less used in medicine, such as IP covariate adjustment or weighting by the inverse of the probability of receiving the intervention. But that is another story…