E-value.
How unmeasured confounding can distort associations in observational studies is reviewed, and parameters to quantify this effect are presented. This article explains how the E-value quantifies the minimum strength that an unmeasured confounder would need to have in order to fully explain an observed effect or make it compatible with the absence of association. Finally, it briefly discusses its extensions to different effect measures and its complementary role to p-values in critical appraisal.
In almost every town there’s a cursed roundabout. Ever since it was built (or so people say) there have been more crashes, more traffic jams, more divorces, and even fewer parking spaces. The roundabout becomes the official culprit for everything: it’s there, you can see it, and it’s wonderfully convenient to point at. Nobody stops to think that, at the same time, the traffic has changed, the street lighting has changed, the kind of cars and people’s schedules have changed… but never mind: having one simple cause makes us feel much calmer than admitting we don’t really understand what’s going on.
We do something very similar with observational studies. We see that people who take X have less Y, or that those who do Z die more than those who don’t, and our brain jumps straight to causality: “X protects”, “Z kills”. But what these studies actually capture is association, not causation. And that association can always be the fault of some confounding factor we don’t know about, didn’t measure, or didn’t adjust for properly. A “little detail” that was left out of the model… but not out of reality.
That’s why only clinical trials can really demonstrate causality, because we assume that randomisation spreads confounding factors (known and unknown) equally between the intervention group and the control group. Although, just by the way, we don’t have an absolute guarantee of that either.
So in this post we’re going to think about the difference between chasing causality and settling for pretty associations. And, more specifically, how we can ask ourselves an uncomfortable question: how strong would that unknown confounder have to be to wipe out the relationship we’ve found? That’s where our protagonist with the gadget name comes in: the E-value, a way of putting a number on the ghost and deciding whether we buy the causal story… or whether, once again, the roundabout might not be the one to blame for everything.
Confounding
First of all, let’s remind ourselves what confounding factors are.
A confounder is a variable that is related both to the exposure and to the outcome, but is not part of the direct causal chain between them. The big danger with confounders is that, if we don’t measure them (or measure them badly), they can manufacture fake associations: they make it look as if the exposure “protects” against or “promotes” the outcome, when in reality the difference is due to something else that comes bundled with it.
The consequence of unknown or poorly controlled confounding is that a study may give us an association measure between exposure and outcome, but that number may be telling us mostly the story of the confounder, not the story of the exposure we care about.
Now imagine a cohort study lands on your desk claiming that “exercise reduces the risk of fildulastrosis by 75%”. Fildulastrosis is a terrifying disease whose name sounds like something between a tropical infection and a biblical punishment, so you latch onto the key idea: the gym is going to save your life.
The problem is that cohort studies, as we know, are observational. In this case, the study compares people who go to the gym with people who prefer the sofa, and finds that the gym-goers have a risk ratio (RR) of 0.25 for fildulastrosis compared with the couch potatoes. Four times less risk. The authors conclude something along the lines of “these results suggest a possible protective effect of physical exercise”. And off we go, browsing for leggings online.
But when we think about it twice, we realise that there’s a lot going on between the sofa and the treadmill. In our imaginary study, nobody has measured one seemingly innocent detail: compulsive consumption of TV reality shows. It turns out that hardcore fans of “Island of the Famous In-Laws” tend to spend the night glued to the screen, sleep little, eat whatever they can grab, and, in general, don’t set foot in the gym, not even to use the bathroom.
To keep things simple, let’s say that among those who don’t exercise, 60% are reality addicts, whereas among the sporty types only 20% are. That already tells us that “watching reality shows” is associated with the exposure “not exercising”. And to top it off, let’s assume that reality addicts have five times the risk of developing fildulastrosis compared with those who prefer reading medicine leaflets before bed. That variable nobody bothered to include in the questionnaire has a first and last name: it’s an unmeasured confounding factor.
Quantifying the unknown
Up to this point, the story is more or less familiar. In observational studies we can almost never measure absolutely everything: odd habits, personality traits, small socio-economic details… There can always be something that is associated both with the exposure and with the outcome and that is quietly pushing the association without us noticing.
And that means the association we see might not be causal. So, we can ask ourselves a very concrete question: if there is some unknown confounder, like our beloved reality TV, how strong would its effect on the exposure and on the outcome have to be to fully explain the observed association?
We can start by translating the strength of the confounder into numbers. Let’s go back to our example. We can describe how powerful “watching reality shows” is in distorting the study result using two risk ratios (RR).
One RR summarises how the probability of the exposure changes when you change the level of the confounder; in our case, how much the probability of being sedentary increases if you’re a reality fan compared with if you’re not. If among reality fans 70% don’t exercise and among non-fans only 30% are sedentary, the exposure–confounder risk ratio would be roughly 0.70 / 0.30, that is, a bit more than 2.
We’ll call this RR of the exposure according to the confounder RR_EC (exposure–confounder).
The second RR summarises how the risk of the study outcome changes when the level of the confounder changes; that is, how much the risk of fildulastrosis increases in reality fans compared with non-fans, adjusting for what we’ve already measured. If that’s a five-fold increase, we’d say RR_RC (result–confounder) = 5.
With these two numbers (RR_EC and RR_RC) we can already calculate how much a confounder like this could distort the estimated association at most. To do that, we introduce a new parameter, usually called B (for bias). The formula is:
B = RR_RC x RR_EC / (RR_RC + RR_EC -1)
Don’t let the formula scare you, it’s actually quite simple. B is the factor by which the effect might be inflated if there is an unmeasured confounder, so we calculate it to estimate what the “real” association between exposure and outcome would be without that confounder.
In the numerator we have the product of the two RR, the one linking the exposure to the confounder and the one linking the outcome to the confounder. That’s logical: for the confounder to distort the exposure–outcome relationship, it has to affect both, by definition. If you plug 1 for one of the two RR in the formula (meaning it’s only related to the exposure or only to the outcome), B = 1. Conclusion: if it’s not associated with both, the confounder cannot bias the measure of association (there is no confounding). On the other hand, the larger the product of the two RR, the larger the potential bias.
As for the denominator, don’t worry too much about it. It’s a mathematical trick to stop B from blowing up and to avoid impossible RR values.
Let’s do a concrete numerical example to see how we use B. Suppose RR_EC = 3 (reality fans are three times more likely to be sedentary) and RR_RC = 5 (they have five times the risk of fildulastrosis). Then B would be:
B = 5 x 3 / (5 + 3 – 1) = 15 / 7 = 2.1.
This means that, at most, the unmeasured confounder could be inflating (or biasing) the observed RR by a factor of 2.1, but no more.
If the study observes that sedentary people have RR = 4 for fildulastrosis compared with exercisers, the “reality-free” effect could not be lower than:
RR = 4 / 2.1 = 1.9
In other words, even with a very powerful confounder, the association doesn’t disappear: there is still almost double the risk among sedentary people. Our reality show explains part of the mess, but not all of it.
If, on the other hand, with other combinations of RR_EC and RR_RC the value of B were so large that the observed RR divided by B dropped below 1, then yes, the confounder could fully explain the association without needing any causal effect of exercise.
To wrap up this section, notice that so far we’ve only talked about risk factors, with RR > 1. In cases where the observed association is protective, RR < 1, we multiply by B instead of dividing, to estimate the possible bias in the measure due to an unmeasured confounder.
The E-value
So far we’ve been trying to answer the question of how much bias in the RR could be produced by an unmeasured confounder with a given strength of association. Of course, we’ve been doing this completely by guesswork, because the size of the unmeasured confounding is, shockingly, unknown, so we “make up” plausible values for it.
A more elegant way to proceed is to flip the question around and ask instead: what minimum strength would a confounder need to have in order to fully explain the association we see in the study? And the answer to that comes from the star of this post: the E-value.
When we observe an RR > 1, the E-value is defined by the following formula:
E-value = RR + Square root(RR X (RR – 1))
When RR is protective, i.e. less than 1, we first take the inverse of the RR and then use the same formula.
It looks like a magic formula, and we’re not going to derive it here, but its interpretation is very down to earth: it’s the minimum RR that both associations of the confounder, confounder with exposure and confounder with outcome, must have for a single unmeasured confounder to be able to fully explain the observed association.
In the imaginary plane of all possible pairs of RR_EC and RR_RC that could bias the observed effect, the E-value identifies the point where those two RRs are equal and, at the same time, as small as possible.
Let’s apply it to our gym example. Suppose the study finds that people who exercise have one quarter the risk of fildulastrosis compared with sedentary people (RR = 0.25). Since this is a protective effect, we start by taking the inverse: 1 / RR = 4. Now we plug that into the formula and get an E-value of 7.4.
How do we interpret that? Only an unmeasured confounder that increased the probability of exercising by about seven-fold and, at the same time, increased the risk of fildulastrosis by about seven-fold could, on its own, fully explain the observed association. Anything weaker than that could shrink the effect, but not wipe it out.
This point estimate is very elegant, but you already know I like to suffer with confidence intervals. Imagine that the 95% confidence interval for RR = 0.25 is 0.10 to 0.60. The limit closest to the null (which is 1) is 0.60. Again, we do the inverse trick (we get roughly 1.66) and apply the formula. We obtain an E-value for the interval limit of around 2.7. This means that an unmeasured confounder associated with both exercise and fildulastrosis with RRs of at least 2.7 could push that limit up to 1 and make the effect “compatible with no association”. Any weaker confounder wouldn’t manage it.
Logically, if RR > 1, we would take the lower confidence limit to do the calculations. Always use the limit closest to the null value, which is 1.
Notice the difference in the message the E-value gives us. A high E-value for the point estimate tells us the effect is very robust: we would need an unmeasured confounder with a huge strength of association to explain it away if the observed association were not real.
The E-value for the confidence limit closest to 1 will be more modest: with a confounder of moderate strength, the interval could already be made to touch 1. The two numbers tell complementary stories, and they’re far more informative than a tiny p-value left on its own.
Supporting actors, but top billing
So far, we’ve been talking about RRs, but in observational studies we often end up with odds ratios (OR), hazard ratios (HR), rate ratios or even differences in means. The trick with the E-value is to find a reasonable way to translate any of these into something that behaves like a risk ratio, so we can plug it into the same formula.
If the outcome is rare, say below 10–15% prevalence, the OR and the HR look quite similar to the RR and can be used more or less directly. If the outcome is common and all we have is an OR, we can use an approximation. One option is to take the square root of the OR as a rough stand-in for the RR and work from there.
With HRs and common events, there are slightly more complex formulas that let you do something equivalent. And when what we have are differences in means, we can standardise the effect size and then interpret that, approximately, as a kind of risk ratio via an exponential transformation.
All of this sounds more frightening than it is. The philosophy is the same: we start with some effect measure, translate it into an approximate RR, compute the E-value with the usual formula, and end up with a single number that answers the same question: how strong would the unknown confounder have to be to wreck our result?
We’re leaving…
And this is where we’ll stop, at least for today, spinning around the cursed roundabout of unmeasured confounders.
Before we go, I want to stress that the beauty of the E-value isn’t just in the number itself, but in how we interpret it. We have to compare it with associations we already know or suspect in our own study. If your strongest measured covariates have RRs of 2 or 3 with the outcome and 1.5 or 2 with the exposure, imagining an unmeasured confounder that reaches a very high value on both sides starts to sound like science fiction.
On the other hand, if the E-value for the interval is more modest, then any reasonably sized, unmeasured “something” might be enough to explain the association. A tiny p-value doesn’t help much here: with a big enough sample size you can get a ridiculous p for microscopic effects that any plausible confounding could account for.
And finally, don’t forget that the E-value only manages one particular family of problems: unmeasured confounding. It doesn’t protect you from selection bias, measurement error, loss to follow-up, or the irresistible urge to torture the data until they confess what you wanted to hear in the first place. All of those still need to be handled with care as well. But that’s another story…