King of Kings

Randomized clinical trial

There is no doubt that when doing a research in biomedicine we can choose from a large number of possible designs, all with their advantages and disadvantages. But in such a diverse and populous court, among jugglers, wise men, gardeners and purple flautists, it reigns over all of them the true Crimson King in epidemiology: the randomized clinical trial.

Definition of ranndomized clinical trial

The clinical trial is an interventional analytical study, with antegrade direction and concurrent temporality, and with sampling of a closed cohort with control of exposure. In a trial, a sample of a population is selected and divided randomly into two groups. One of the groups (intervention group) undergoes the intervention that we want to study, while the other (control group) serves as a reference to compare the results. After a given follow-up period, the results are analyzed and the differences between the two groups are compared. We can thus evaluate the benefits of treatments or interventions while controlling the biases of other types of studies: randomization favors that possible confounding factors, known or not, are distributed evenly between the two groups, so that if in the end we detect any difference, this has to be due to the intervention under study. This is what allows us to establish a causal relationship between exposure and effect.

From what has been said up to now, it is easy to understand that the randomized clinical trial is the most appropriate design to assess the effectiveness of any intervention in medicine and is the one that provides, as we have already mentioned, a higher quality evidence to demonstrate the causal relationship between the intervention and the observed results.

But to enjoy all these benefits it is necessary to be scrupulous in the approach and methodology of the trials. There are checklists published by experts who understand a lot of these issues, as is the case of the CONSORT list, which can help us assess the quality of the trial’s design. But among all these aspects, let us give some thought to those that are crucial for the validity of the clinical trial.

Components of randomized clinical trials

Everything begins with a knowledge gap that leads us to formulate a structured clinical question. The only objective of the trial should be to answer this question and it is enough to respond appropriately to a single question. Beware of clinical trials that try to answer many questions, since, in many cases, in the end they do not respond well to any. In addition, the approach must be based on what the inventors of methodological jargon call the equipoise principle, which does not mean more than, deep in our hearts, we do not really know which of the two interventions is more beneficial for the patient (from the ethical point of view, it would be necessary to be anathema to make a comparison if we already know with certainty which of the two interventions is better). It is curious in this sense how the trials sponsored by the pharmaceutical industry are more likely to breach the equipoise principle, since they have a preference for comparing with placebo or with “non-intervention” in order to be able to demonstrate more easily the efficacy of their products.Then we must carefully choose the sample on which we will perform the trial. Ideally, all members of the population should have the same probability not only of being selected, but also of finishing in either of the two branches of the trial. Here we are faced with a small dilemma. If we are very strict with the inclusion and exclusion criteria, the sample will be very homogeneous and the internal validity of the study will be strengthened, but it will be more difficult to extend the results to the general population (this is the explanatory attitude of sample selection). On the other hand, if we are not so rigid, the results will be more similar to those of the general population, but the internal validity of the study may be compromised (this is the pragmatic attitude).

Randomization is one of the key points of the clinical trial. It is the one that assures us that we can compare the two groups, since it tends to distribute the known variables equally and, more importantly, also the unknown variables between the two groups. But do not relax too much: this distribution is not guaranteed at all, it is only more likely to happen if we randomize correctly, so we should always check the homogeneity of the two groups, especially with small samples.

In addition, randomization allows us to perform masking appropriately, with which we perform an unbiased measurement of the response variable, avoiding information biases. These results of the intervention group can be compared with those of the control group in three ways. One of them is to compare with a placebo. The placebo should be a preparation of physical characteristics indistinguishable from the intervention drug but without its pharmacological effects. This serves to control the placebo effect (which depends on the patient’s personality, their feelings towards the intervention, their love for the research team, etc.), but also the side effects that are due to the intervention and not to the pharmacological effect (think, for example, of the percentage of local infections in a trial with medication administered intramuscularly).

The other way is to compare with the accepted as the most effective treatment so far. If there is a treatment that works, the logical (and more ethical) is that we use it to investigate whether the new one brings benefits. It is also usually the usual comparison method in equivalence or non-inferiority studies. Finally, the third possibility is to compare with non-intervention, although in reality this is a far-fetched way of saying that only the usual care that any patient would receive in their clinical situation is applied.

It is essential that all participants in the trial are submitted to the same follow-up guideline, which must be long enough to allow the expected response to occur. All losses that occur during follow-up should be detailed and analyzed, since they can compromise the validity and power of the study to detect significant differences. And what do we do with those that get lost or end up in a different branch to the one assigned? If there are many, it may be more reasonable to reject the study. Another possibility is to exclude them and act as if they had never existed, but we can bias the results of the trial. A third possibility is to include them in the analysis in the branch of the trial in which they have participated (there is always one that gets confused and takes what he should not), which is known as analysis by treatment or analysis by protocol. And the fourth and last option we have is to analyze them in the branch that was initially assigned to them, regardless of what they did during the study. This is called the intention-to-treat analysis, and it is the only one of the four possibilities that allows us to retain all the benefits that randomization had previously provided.

Data analysis

As a final phase, we would have the analyze and compare the data to draw the conclusions of the trial, using for this the association and impact measures of effect that, in the case of the clinical trial, are usually the response rate, the risk ratio (RR), the relative risk reduction (RRR), the absolute risk reduction (ARR) and the number needed to treat (NNT). Let’s see them with an example.

Let’s imagine that we carried out a clinical trial in which we tried a new antibiotic (let’s call it A not to get warm from head to feet) for the treatment of a serious infection of the location that we are interested in studying. We randomize the selected patients and give them the new drug or the usual treatment (our control group), according to what corresponds to them by chance. In the end, we measure how many of our patients fail treatment (present the event we want to avoid).

Thirty six out of the 100 patients receiving drug A present the event to be avoided. Therefore, we can conclude that the risk or incidence of the event in those exposed (Ie) is 0.36. On the other hand, 60 of the 100 controls (we call them the group of not exposed) have presented the event, so we quickly calculate that the risk or incidence in those not exposed (Io) is 0.6.

At first glance we already see that the risk is different in each group, but as in science we have to measure everything, we can divide the risks between exposed and not exposed, thus obtaining the so-called risk ratio (RR = Ie / Io). An RR = 1 means that the risk is equal in the two groups. If the RR> 1 the event will be more likely in the group of exposed (the exposure we are studying will be a risk factor for the production of the event) and if RR is between 0 and 1, the risk will be lower in those exposed. In our case, RR = 0.36 / 0.6 = 0.6. It is easier to interpret RR> 1. For example, a RR of 2 means that the probability of the event is twice as high in the exposed group. Following the same reasoning, a RR of 0.3 would tell us that the event is a third less frequent in the exposed than in the controls. You can see in the attached table how these measures are calculated.

But what we are interested in is to know how much the risk of the event decreases with our intervention to estimate how much effort is needed to prevent each one. For this we can calculate the RRR and the ARR. The RRR is the risk difference between the two groups with respect to the control (RRR = [Ie-Io] / Io). In our case it is 0.4, which means that the intervention tested reduces the risk by 60% compared to the usual treatment.

The ARR is simpler: it is the difference between the risks of exposed and controls (ARR = Ie – Io). In our case it is 0.24 (we ignore the negative sign), which means that out of every 100 patients treated with the new drug there will be 24 fewer events than if we had used the control treatment. But there is still more: we can know how many we have to treat with the new drug to avoid an event by just doing the rule of three (24 is to 100 as 1 is to x) or, easier to remember, calculating the inverse of the ARR. Thus, the NNT = 1 / ARR = 4.1. In our case we would have to treat four patients to avoid an adverse event. The context will always tell us the clinical importance of this figure.

As you can see, the RRR, although it is technically correct, tends to magnify the effect and does not clearly quantify the effort required to obtain the results. In addition, it may be similar in different situations with totally different clinical implications. Let’s see it with another example that I also show you in the table. Suppose another trial with a drug B in which we obtain three events in the 100 treated and five in the 100 controls. If you do the calculations, the RR is 0.6 and the RRR is 0.4, as in the previous example, but if you calculate the ARR you will see that it is very different (ARR = 0.02), with an NNT of 50 It is clear that the effort to avoid an event is much greater (4 versus 50) despite the same RR and RRR.

So, at this point, let me advice you. As the data needed to calculate RRR are the same than to calculate the easier ARR (and NNT), if a scientific paper offers you only the RRR and hide the ARR, distrust it and do as with the brother-in-law who offers you wine and cured cheese, asking him why he does not better put a skewer of Iberian ham. Well, I really wanted to say that you’d better ask yourselves why they don’t give you the ARR and compute it using the information from the article.

Basic design modifications

So far all that we have said refers to the classical design of parallel clinical trials, but the king of designs has many faces and, very often, we can find papers in which it is shown a little differently, which may imply that the analysis of the results has special peculiarities.

Let’s start with one of the most frequent variations. If we think about it for a moment, the ideal design would be that which would allow us to experience in the same individual the effect of the study intervention and the control intervention (the placebo or the standard treatment), since the parallel trial is an approximation that it assumes that the two groups respond equally to the two interventions, which always implies a risk of bias that we try to minimize with randomization. If we had a time machine we could try the intervention in all of them, write down what happens, turn back the clock and repeat the experiment with the control intervention so we could compare the two effects. The problem, the more alert of you have already imagined, is that the time machine has not been invented yet.

But what has been invented is the cross-over clinical trial, in which each subject is their own control. As you can see in the attached figure, in this type of test each subject is randomized to a group, subjected to the intervention, allowed to undergo a wash-out period and, finally, subjected to the other intervention. Although this solution is not as elegant as that of the time machine, the defenders of cross-trials argue the fact that variability within each individual is less than the interindividual one, with which the estimate can be more accurate than that of the parallel trial and, in general, smaller sample sizes are needed. Of course, before using this design you have to make a series of considerations. Logically, the effect of the first intervention should not produce irreversible changes or be very prolonged, because it would affect the effect of the second. In addition, the washing period must be long enough to avoid any residual effects of the first intervention.

It is also necessary to consider whether the order of the interventions can affect the final result (sequence effect), with which only the results of the first intervention would be valid. Another problem is that, having a longer duration, the characteristics of the patient can change throughout the study and be different in the two periods (period effect). And finally, beware of the losses during the study, which are more frequent in longer studies and have a greater impact on the final results than in parallel trials.

Imagine now that we want to test two interventions (A and B) in the same population. Can we do it with the same trial and save costs of all kinds? Yes, we can, we just have to design a factorial clinical trial. In this type of trial, each participant undergoes two consecutive randomizations: first it is assigned to intervention A or to placebo (P) and, second, to intervention B or placebo, with which we will have four study groups: AB, AP, BP and PP. As is logical, the two interventions must act by independent mechanisms to be able to assess the results of the two effects independently.

Usually, an intervention related to a more plausible and mature hypothesis and another one with a less contrasted hypothesis are studied, assuring that the evaluation of the second does not influence the inclusion and exclusion criteria of the first one. In addition, it is not convenient that neither of the two options has many annoying effects or is badly tolerated, because the lack of compliance with one treatment usually determines the poor compliance of the other. In cases where the two interventions are not independent, the effects could be studied separately (AP versus PP and BP versus PP), but the design advantages are lost and the necessary sample size increases.

At other times it may happen that we are in a hurry to finish the study as soon as possible. Imagine a very bad disease that kills lots of people and we are trying a new treatment. We want to have it available as soon as possible (if it works, of course), so after every certain number of participants we will stop and analyze the results and, in the case that we can already demonstrate the usefulness of the treatment, we will consider the study finished. This is the design that characterizes the sequential clinical trial. Remember that in the parallel trial the correct thing is to calculate previously the sample size. In this design, with a more Bayesian mentality, a statistic is established whose value determines an explicit termination rule, so that the size of the sample depends on the previous observations. When the statistic reaches the predetermined value we see ourselves with enough confidence to reject the null hypothesis and we finish the study. The problem is that each stop and analysis increases the error of rejecting it being true (type 1 error), so it is not recommended to do many intermediate analysis. In addition, the final analysis of the results is complex because the usual methods do not work, but there are others that take into account the intermediate analysis. This type of trial is very useful with very fast-acting interventions, so it is common to see them in titration studies of opioid doses, hypnotics and similar poisons.

Clustered trials

There are other occasions when individual randomization does not make sense. Imagine we have taught the doctors of a center a new technique to better inform their patients and we want to compare it with the old one. We cannot tell the same doctor to inform some patients in one way and others in another, since there would be many possibilities for the two interventions to contaminate each other. It would be more logical to teach the doctors in a group of centers and not to teach those in another group and compare the results. Here what we would randomize is the centers to train their doctors or not. This is the trial with group assignment design. The problem with this design is that we do not have many guarantees that the participants of the different groups behave independently, so the size of the sample needed can increase a lot if there is great variability between the groups and little within each group. In addition, an aggregate analysis of the results has to be done, because if it is done individually, the confidence intervals are falsely narrowed and we can find false statistical meanings. The usual thing is to calculate a weighted synthetic statistic for each group and make the final comparisons with it.

The last of the series that we are going to discuss is the community essay, in which the intervention is applied to population groups. When carried out in real conditions on populations, they have great external validity and often allow for cost-efficient measures based on their results. The problem is that it is often difficult to establish control groups, it can be more difficult to determine the necessary sample size and it is more complex to make causal inference from their results. It is the typical design for evaluating public health measures such as water fluoridation, vaccinations, etc.

We’re leaving…

I’m done now. The truth is that this post has been a bit long (and I hope not too hard), but the King deserves it. In any case, if you think that everything is said about clinical trials, you have no idea of all that remains to be said about types of sampling, randomization, etc., etc., etc. But that is another story…

Ménage à trois

In this post we will give another twist to the issue of the variables that can disturb the harmonious relationship of the couple formed by exposure and effect, so that all those dirty minds waiting else reading the title can move to the next result of Google, for sure who match what they were looking for.

We saw that there exist confounding variables that are related to the effect and the exposure and how they can alter our estimates of the measures of association if these variables are not distributed evenly among the study groups. We talk about our backdoor, how to avoid it and how close it both in cohort and in case and control studies.

But sometimes the effect of exposure on the outcome studied is not always the same and can vary in intensity as the value or the level of a third variable is changed. As was the case with confounding, we observe it better stratifying the results for analysis, but in these cases is not due to the uneven distribution of the variable, but the effect of exposure is actually modified by the magnitude of this variable, which is called modifying variable or interaction effect.

Naturally, it is essential to distinguish between confounding and interaction variable. The effect of the confounding variable depends on its distribution among the study groups. In experimental studies, this distribution may vary according to the distribution occurred during randomization, so a variable can act as confounder in one trial and not in another. However, in observational studies they always exert their effect, as they are associated both with the exposure and the effect. When we find a confounding variable our goal is to control its effect and estimate an adjusted measure of association.

On the other hand, effect modifier variables represent characteristics of the relationship between exposure and effect whose intensity depends on the ménage à trois created by the interaction of this third variable. If you think about it, in the event that there’s a modification of effect we’ll not be interested in calculating an adjusted measure of association, as we could do with the Mantel-Haenszel test, because it wouldn’t be representative of the overall action of exposure on effect. Neither is good idea to do the simple arithmetic average of the measures of association we observe in each stratum. In any case what we have to do is to describe it and not try to control it, as we do with confounding variables.

Before we can say that there is an effect modifier variable we must discard that the observed differences are due to chance, confounding or bias of our study. Observing the confidence intervals of the estimated measures can help to rule out chance, as it will be more unlikely if the intervals do not overlap. We can also calculate whether differences among strata are statistically significant, using the appropriate test according to the design of the study.

And can we estimate an overall measure of the influence of exposure on the effect that takes into account the existence of an interaction variable?. Of course we can, does anyone doubt it?.

Perhaps the easiest way is to calculate a standardized measure. To do so we compare two different measures, one which assumes that each element of each stratum has the risk of the exposed and another which assumes the same but in non-exposed. Doing so we estimate a measure of the association in the global standard population we have set. Confused?. Let’s see an example. We’re going to continue boring you to exhaustion with poor smokers and their coronary artery disease. In the first table are the results of a study that I just invented over smoking and myocardial infarction.modifier_variable

We see that, overall, smokers have seven times higher risk of suffering infarction than non-smokers (relative risk, RR = 7). Let’s assume that smokers and nonsmokers have a similar age distribution, but that if we break down the data into two age groups the risks are different. The RR under 50 years is 2, compared to the older than 50, whose risk of heart attack is three times higher for smokers than for non-smokers.

standardized_RRWe will calculate two measures of association, one assuming that everyone smokes and the other assuming none smokes. In younger than 50 years, the risk of myocardial infarction if all smoke is 5/197 = 0.02. If we have 454 people less than 50, the expected number of infarctions would be 454×0.02 = 9.1. The risk in non-smokers would be 3/257 = 0.01, so we expect to find 0.01×454 = 4.5 cases of infarction in non-smokers.

We do the same calculations with the older than 50 and we add the total number of people (770), the total number of heart attacks in smokers (47.1) and nonsmokers (10.8). The standardized risk in smokers in this population is 47.1 / 770 = 0.06. The standardized risk in nonsmokers, 10.8 / 770 = 0.01. Finally, we calculate the standardized RR: 0.06 / 0.01 = 6. This means that, globally, smoking increases six time the risk of myocardial infarction, but do not forget that this result is valid only for this standard population and it would probably not for a different population.

Just one more thing before finishing. As with the analysis of confounding variables, the analysis of effect modifiers can also be done by regression, introducing an interaction coefficient in the obtained equation to correct the effect of the modifier. Moreover, these coefficients are very useful to us because their statistical significance serves to distinguish between confusion and interaction. But that is another story…

To what do you attribute it?

It seems like only yesterday. I began my adventures at the hospital and had my first contacts with The Patient. And, by the way, I didn’t know much about diseases but I knew without thinking about it what were the three questions with which any good clinical history began: what is bothering you?, how long has it been going on?, and to what do you attribute it?.

The fact is that the need to know the why of things is inherent to human nature and, of course, is of great importance in medicine. Everyone is mad for establishing cause and effect relations; sometimes one does it rather loosely and comes to the conclusion that the culprit of his summer’s cold is the supermarket’s guy, who has set the air conditioned at maximal power. This is the reason why studies on etiology must be conducted and assessed with scientific rigour. For this reason and because when we talk about etiology we also refer to harm, including that derived from our own actions (what educated people call iatrogenic).

This is why studies on etiology/harm have similar designs. The clinical trial is the ideal choice and we can use it, for example, to know if a treatment is the cause of the patient’s recovery. But when we study risk factors or harmful exposures, the ethical principle of nonmaleficence prevent us to randomized exposures, so we have to resort to observational studies such us cohort studies or case-control studies, although the level of evidence provided by them will be smaller than that of the experimental studies.

To critically appraise a paper on etiology / harm, we’ll resort to our well-known pillars: validity, relevance and applicability.

First, we’ll focus on the VALIDITY or scientific rigour of the work, which should answer to the question whether the factor or intervention studied was the cause of the adverse effect or disease observed.

As always, we’ll asses a series of primary validity criteria. If these are not fulfilled, we’ll left the paper and devote ourselves to something else more profitable. The first is to determine whether groups compared were similar regarding to other important factors different from the exposure studied. Randomization in clinical trials provides that the groups are homogeneous, but we cannot count on it in the case of observational studies. The homogeneity of the two cohorts is essential and the study is not valid without it. One can always argue that has stratified the differences between the two groups or that has made a multivariate analysis to control for the effect of known confounders but, what about the unknown?. The same applies to case-control studies, much more sensitive to bias and confusion.

Have exposure and effect been assessed in the same way in all groups?. In clinical trials and cohort studies we have to check that the effect has had the same likelihood of appearance and of be detected in the two groups. Moreover, in case-control studies is very important to properly asses previous exposure, so we must investigate whether there is potential bias in data collection, such us recall bias (patients often remember symptoms better than healthy). Finally, we must consider if follow-up has been long enough and complete. Losses during the study, common in observational designs, can bias the results.

If we have answered yes to all the three questions, we’ll turn to consider secondary validity criteria. Study’s results have to be evaluated to determine whether the association between exposure and effect satisfies a reasonably evidence of causality.Hill_en One useful tool are the Hill’s criteria, which was a gentleman who suggested using a series of items to try to distinguish the causal or non-causal nature of an association. These criteria are: a) strength of association, represented by the risk ratio between exposure and effect, that we’ll consider shortly; b) consistency, which is reproducibility in populations or in different situations; c) specificity, which means that a cause produces a unique effect and no a multiple one; d) temporality: it’s essential that cause precedes the effect; e) biological gradient: the more intense the cause, the more intense the effect; f) plausibility: the relationship has to be logical according to our biological knowledge; g) coherence, the relationship should not be in conflict with other knowledge about disease or effect; h) experimental evidence, often difficult to obtain in humans for ethical reasons; and finally, i) analogy to other known situations. Although these are a quite-vintage criteria and some of them may be irrelevant (experimental evidence or analogy), they may serve as a guidance. The criterion of temporality would be a necessary one and would be well complemented with biological gradient, plausibility and coherence.

Another important aspect is to consider whether, apart from the intervention under study, both groups were treated similarly. In this type of study in which the double-blind is absent is where there is more risk of bias due to co-interventions, especially if these are treatments with a much greater effect than the exposure under study.

Regarding the RELEVANCE of the results, we must consider the magnitude and precision of the association between exposure and effect.

What was the strength of the association?. The most common measure of association is the risk ratio (RR), which can be used in trials and cohort studies. However, in case-control studies we don’t know the incidence of the effect (the effect has occurred when the study is conducted), so we used the odds ratio (OR). As we know, the interpretation of the two parameters is similar. Even the values of the two are similar when the frequency of the effect is very low. However, the greater the magnitude or frequency of the effect, the more different RR and OR are, with the peculiarity that the OR tends to overestimate the strength of the association when it is greater than 1 and underestimate it when it is less than 1. Anyway, these vagaries of OR will exceptionally modify the qualitative interpretation of the results.

It has to be kept in mind that a test is statistically significant for any value of OR or RR whose confidence interval does not include one, but observational studies have to be a little more demanding. Thus, in a cohort study we’ll like to see values greater than or equal to three for RR and equal than or greater than four in case-control studies.

Another useful parameter (in trials and cohort studies) is the difference in risks or incidence difference, which is a fancy way of calling our known absolute risk reduction (ARR), which allows us to calculate the NNT (or NNH, number needed to harm) parameter that best quantifies us the clinical significance of the association. Also, similar to the relative risk reduction (RRR), we have the attributable fraction in the exposed, which is the percentage of risk observed in the exposed that is due to exposure.

And, what is the accuracy of the results?. As we know, we’ll use our beloved confidence intervals, which serve to determine the accuracy of the parameter estimate in the population. It is always useful to have all these parameters, which must be included in the study or its calculation should be possible from the data provided by the authors.

Finally, we’ll asses the APPLICABILITY of the results to our clinical practice.

Are the results applicable to our patients?. Search to see if there are differences that advise against extrapolating results of the work to our environment. Also, consider what is the magnitude of the risk in our patients based on the results of the study and their characteristics. And finally, having all this information in mind, we must think about our working conditions, the choices we have and the patient’s preferences to decide whether to avoid or not the studied exposure. For example, if the magnitude of the risk is high and we have an effective alternative, the decision will be clear, but things are not always so simple.

As always, I advise you to use the resources available on the Internet, such as CASP’s, both the design-specific templates and the calculator to assess the relevance of the results.

Before concluding, let me clarify one thing. Although we’ve said we use RR in cohort studies and clinical trials and we use OR in case-control studies, actually we can use OR in any type of study (not so for RR, for which we must know the incidence of the effect). The problem is that ORs are somewhat less accurate, so we prefer to use RR and NNT whenever possible. However, OR is increasingly popular for another reason, its use in logistic regression models, which allow us to obtain estimates adjusted for confounding variables. But that’s another story…

The ratio’s trap

Odds ratio vs risk ratio

The realm of science is full of traps. They’re everywhere. Neither the major medical journal, nor the most prestigious authors are free of them. Many people tend to take advantage of our ignorance and use interested indicators instead of using the proper ones in order to show the results in an interested way. For this reason, we have to be very alert and always look at the studies’ data to get our own interpretation.

Unfortunately, we cannot avoid the results being manipulated, but we can fight our ignorance and always do a critical appraisal when reading scientific papers.

An example of what I am talking about is the choice between risk ratio and odds ratio.

Odds ratio vs risk ratio

You know the difference between risk and odds. A risk is the proportion of subjects with an event in a total group of susceptible subjects. Thus, we can calculate the risk of having a heart attack among smokers (infarcted smokers divided by the total number of smokers) and among non-smokers (the same, but with non-smokers). If we go a step further, we can calculate the ratio between the two risks, called relative risk or risk ratio (RR), which indicates how much more likely is the occurrence of the event in one group compared with the other group.

Meanwhile, the odds represents a quite different concept. The odds indicates how much more likely is an event to occur than not to occur (p/(1-p)). For example, the odds of suffering a heart attack in smokers is calculated dividing the likelihood of having an attack in smokers (infarcted smokers divided by the total number of smokers, same that we did with the risk) by the probability of not suffering the attack in smokers (non-infarcted smokers divided by the total number of smokers or, equivalently, one minus the likelihood of having the attack). Like we did with the risk, we can calculate the ratio of the odds of the two groups to get the odds ratio (OR), which gives us an idea of how much more likely is the event to occur in one group than the other.

As you can see, they are similar but different concepts. In both cases, the null value is one. A value greater than one indicates that subject located in the numerator have a greater risk, whilst a value less than one indicates that they have less risk of presenting the event. Thus, a RR of 2.5 would mean that the group in the numerator has a 150% greater chance of presenting the event that we are studying. An OR of 2.5 means that it’s two and a half times more likely to present the event in the numerator’s group.

In other way, a RR of 0.4 indicates a 60% reduction of the probability of the event in the numerator group. An OR of 0.4 is more complex to interpret, but it’s more or less the same meaning.

Which of the two should we use?. It depends on the type of study. To calculate the RR we have to previously calculate the risks in the two groups, and for that we have to know the prevalence or cumulative incidence of the disease, so this measure is often used in cohort studies and clinical trials.

In the studies in which the prevalence of disease is unknown, as in case-control studies, there’s no choice but to use OR. But using OR is not restricted to this type of study. We can use it whenever we want, instead of use RR. In addition, a particular case is when it’s used a logistic regression model to adjust for the different confounding factors detected, which provide adjusted ORs.

The difference

trampa_ORIn any case, RR and OR values are similar when the frequency of the effect is low, below 10%, although OR is always slightly lower than RR for values less than one and a little higher for values larger than one. Just a little?. Well, sometimes not so little. In the attached graphic it’s approximately represented the relation between RR and OR. As you can see, as the frequency of the event increases, the OR grows much faster than the RR. And here is where the trap lies, since for the same risk, the impact may seem much higher if we use an OR than if we use a RR. The OR can be misleading when the event is frequent. Let’s see an example.

Imagine that I’m very concerned with obesity among attendees to a movie theater and I want to prevent them to enter the room with a huge tank of a sugary drink whose brand I’m not going to mention. So I count how many viewers buy the drink and get a proportion of 95% of the attendees. Then, a different day, I put a sign in the bar warning about the bad health effect of drinking sugary beverages and, very gladly, I see how the proportion reduces down to an 85%.

In our case, the absolute risk measure of effect is the absolute risk difference, which is only of 10%. That’s something, but it doesn’t look like much: I only get the desired effect in one in ten. Let’s see how association measures work.

The RR is calculated as the ratio 95/85 = 1.17. This indicates that the risk of buying the drink is a 17% higher if we don’t put the sign than if we put it. It doesn’t seem too much, does it?.

The odd of buying the beverage would be 95/(1-95) without putting the sign and 85/(1-85) putting it, so the OR would be equal to (95/5)/(85/15) = 3.35. It means that it’s three times more likely to buy the beverage if we don’t put the sign.

It’s clear that RR gives a better idea that corresponds better with the absolute measure (risk difference), but now I wonder: if my brother-in-law had a factory to make signs, what indicator do you think he would use? No doubt he would use the OR.

This is why we must always look at the results to check if we can calculate some absolute indicator from the study data. Sometimes this is not as easy as in our example, as when the authors presents the OR provided by a regression model. In these cases, if we know the prevalence of the effect or disease under study, we can always calculate the equivalent RR using the following formula:

RR= \frac{OR}{(1-Prev)+(Prev\times OR)}

We’re leaving…

And here we leave the traps for today. You have seen how data and the way of presenting them can be manipulated to say what you want without actually lying. There’re more examples of misuse of relative association measures instead of absolutes ones, such us using the relative risk difference instead of the absolute risk difference. But that’s another story…

Do they not trick you with cheese

If you have at home a bottle of wine that has gotten a bit chopped up, take my advice and don’t throw it away. Wait until you receive one of those scrounger visits (I didn’t mention any brother-in-law!) and offer it to drink it. But you have to combine it with a rather strong cheese. The stronger the cheese is, the better the wine will taste (you can have other thing with any excuse). Well, this trick almost as old as the human species has its parallels in the presentation of the results of scientific work.

Let’s suppose we conduct a clinical trial to test a new antibiotic (call it A) for the treatment of a serious infection that we are interesting in. We randomize the selected patients and give them the new treatment or the usual one (our control group), as chance dictates. Finally, we measure in how many of our patients there’s a treatment failure (how many has the event we want to avoid).

Thirty-six out of the 100 patients receiving drug A presented the event to avoid. Therefore, we can conclude that the risk or incidence of presenting the event in the exposed group (Ie) is 0.36 (36 out of 100). Moreover, 60 out of the 100 controls (we call them the non-exposed group) presented the event, so we quickly compute the risk or incidence in non-exposed (Io) is 0.6.

We see at first glance that risks are different in each group, but as in science we have to measure everything, we can divide risks between exposed and RAR_Anon-exposed to get the so-called relative risk or risk ratio (RR = Ie/Io). A RR = 1 means that the risk is the same in both groups. If RR > 1, the event is more likely in the exposed group (and the exposure we’re studying will be a risk factor for the production of the event); and if RR is between 0 and 1, the risk will be lower in the exposed. In our case, RR = 0.36 / 0.6 = 0.6. It’s easier to interpret the RR when its value is greater than one. For example, a RR of 2 means that the probability of the event is two times higher in the exposed group. Following the same reasoning, a RR of 0.3 would tell us that the event is two-thirds less common in exposed than in controls.

But what interests us is how much decreases the risk of presenting the event with our intervention, in order to estimate how much effort is needed to prevent each event. So we can calculate the relative risk reduction (RRR) and the absolute risk reduction (ARR). The RRR is the difference in risk between the two groups with respect to the control group (RRR = [Ie-Io] / Io). In our case its value is 0.6, which mean that the tested intervention reduces the risk by 60% compared to standard therapy.

The ARR is simpler: it’s the subtraction between the exposes’ and control’s risks (ARR = Ie – Io). In our case is 0.24 (we omit the negative sign; that means that for every 100 patients treated with the new drug, it will occur 24 less events than if we had used the control therapy. But there’s more: we can know how many patients we have to treat with the new drug to prevent each event just using a rule of three (24 is to 100 as 1 is to x) or, more easily remembered, calculating the inverse of the ARR. Thus, we come up with the number needed to treat (NNT) = 1 / ARR = 4.1. In our case we would have to treat four patients to avoid an adverse event. The clinical context will tell us the clinical relevance of this figure.

As you can see, the RRR, although technically correct, tends to magnify the effect and don’t clearly quantify the effort required to obtain the result. In addition, it may be similar in different situations with totally different clinical implications. Let’s look at another example. Suppose another trial with a drug B in which we get three events in the 100 patients treated and five in the 100 controls.

If you do the calculations, the RR is 0.6 and the RRR is 0.4, as in our previous example, but if you compute the ARR you’ll come up with a very RAR_Bdifferent result (ARR = 0.02) and a NNT of 50. It’s clear that the effort to prevent an event is much higher (four vs. 50) despite matching the RR and RRR.

So, at this point, let me advice you. As the data needed to calculate RRR are the same than to calculate the easier ARR (and NNT), if a scientific paper offers you only the RRR and hide the ARR, distrust it and do as with the brother-in-law who offers you wine and strong cheese, asking him to offer an Iberian ham pincho. Well, I really wanted to say that you’d better ask your shelves why they don’t give you the ARR and compute it using the information from the article.

One final thought to close the topic. There’s a tendency and confusion when using or analyzing another measure of association employed in some observational studies: the odds ratio. Although they can sometimes be comparable, as when the prevalence of the effect is very small, in general, odd ratio has other meaning and interpretation. But that’s another story…