Science without sense…double nonsense

Píldoras sobre medicina basada en pruebas

Posts tagged Basic concepts

The King under review

We all know that the randomized clinical trial is the king of interventional methodological designs. It is the type of epidemiological study that allows a better control of systematic errors or biases, since the researcher controls the variables of the study and the participants are randomly assigned among the interventions that are compared.

In this way, if two homogeneous groups that differ only in the intervention present some difference of interest during the follow-up, we can affirm with some confidence that this difference is due to the intervention, the only thing that the two groups do not have in common. For this reason, the clinical trial is the preferred design to answer clinical questions about intervention or treatment, although we will always have to be prudent with the evidence generated by a single clinical trial, no matter how well performed. When we perform a systematic review of randomized clinical trials on the same intervention and combine them in a meta-analysis, the answers we get will be more reliable than those obtained from a single study. That’s why some people say that the ideal design for answering treatment questions is not the clinical trial, but the meta-analysis of clinical trials.

In any case, as systematic reviews assess their primary studies individually and as it is more usual to find individual trials and not systematic reviews, it is advisable to know how to make a good critical appraisal in order to draw conclusions. In effect, we cannot relax when we see that an article corresponds to a clinical trial and take its content for granted. A clinical trial can also contain its traps and tricks, so, as with any other type of design, it will be a good practice to make a critical reading of it, based on our usual three pillars: validity, importance and applicability.

As always, when studying scientific rigor or VALIDITY (internal validity), we will first look at a series of essential primary criteria. If these are not met, it is better not to waste time with the trial and try to find another more profitable one.

Is there a clearly defined clinical question? In its origin, the trial must be designed to answer a structured clinical question about treatment, motivated by one of our multiple knowledge gaps. A working hypothesis should be proposed with its corresponding null and alternative hypothesis, if possible on a topic that is relevant from the clinical point of view. It is preferable that the study try to answer only one question. When you have several questions, the trial may get complicated in excess and end up not answering any of them completely and properly.

Was the assignment done randomly? As we have already said, to be able to affirm that the differences between the groups are due to the intervention, they must be homogeneous. This is achieved by assigning patients randomly, the only way to control the known confounding variables and, more importantly, also those that we do not know. If the groups were different and we attributed the difference only to the intervention, we could incur in a confusion bias. The trial should contain the usual and essential table 1 with the frequency of appearance of the demographic and confusion variables of both samples to be sure that the groups are homogeneous. A frequent error is to look for the differences between the two groups and evaluate them according to their p, when we know that p does not measure homogeneity. If we have distributed them at random, any difference we observe will necessarily be random (we will not need a p to know that). The sample size is not designed to discriminate between demographic variables, so a non-significant p may simply indicate that the sample is small to reach statistical significance. On the other hand, any minimal difference can reach statistical significance if the sample is large enough. So forget about the p: if there is any difference, what you have to do is assess whether it has sufficient clinical relevance to have influenced the results or, more elegantly, we will have to control the unbalanced covariates during the randomization. Fortunately, it is increasingly rare to find the tables of the study groups with the comparison of p between the intervention and control groups.

But it is not enough for the study to be randomized, we must also consider whether the randomization sequence was done correctly. The method used must ensure that all components of the selected population have the same probability of being chosen, so random number tables or computer generated sequences are preferred. The randomization must be hidden, so that it is not possible to know which group the next participant will belong to. That is why people like centralized systems by telephone or through the Internet. And here is something very curious: it turns out that it is well known that randomization produces samples of different sizes, especially if the samples are small, which is why samples randomized by blocks balanced in size are sometimes used. And I ask you, how many studies have you read with the same number of participants in the two branches and who claimed to be randomized? Do not trust if you see equal groups, especially if they are small, and do not be fooled: you can always use one of the multiple binomial probability calculators available on the Internet to know what is the probability that chance generates the groups that the authors present (we always speak of simple randomization, not by blocks, conglomerates, minimization or other techniques). You will be surprised with what you will find.

It is also important that the follow-up has been long and complete enough, so that the study lasts long enough to be able to observe the outcome variable and that every participant who enters the study is taken into account at the end. As a general rule, if the losses exceed 20%, it is admitted that the internal validity of the study may be compromised.

We will always have to analyze the nature of losses during follow-up, especially if they are high. We must try to determine if the losses are random or if they are related to any specific variable (which would be a bad matter) and estimate what effect they may have on the results of the trial. The most usual is usually to adopt the so-called worst-case scenarios: it is assumed that all the losses of the control group have gone well and all those in the intervention group have gone badly and the analysis is repeated to check if the conclusions are modified, in which case the validity of the study would be seriously compromised. The last important aspect is to consider whether patients who have not received the previously assigned treatment (there is always someone who does not know and mess up) have been analyzed according to the intention of treatment, since it is the only way to preserve all the benefits that are obtained with randomization. Everything that happens after the randomization (as a change of the assignment group) can influence the probability that the subject experiences the effect we are studying, so it is important to respect this analysis by intention to treat and analyze each one in the group in which it was initially assigned.

Once these primary criteria have been verified, we will look at three secondary criteria that influence internal validity. It will be necessary to verify that the groups were similar at the beginning of the study (we have already talked about the table with the data of the two groups), that the masking was carried out in an appropriate way as a form of control of biases and that the two groups were managed and controlled in a similar way except, of course, the intervention under study. We know that masking or blinding allows us to minimize the risk of information bias, which is why the researchers and participants are usually unaware of which group is assigned to each, which is known as double blind. Sometimes, given the nature of the intervention (think about a group that is operated on and another one that does not) it will be impossible to mask researchers and participants, but we can always give the masked data to the person who performs the analysis of the results (the so-called blind evaluator), which ameliorate this incovenient.

To summarize this section of validity of the trial, we can say that we will have to check that there is a clear definition of the study population, the intervention and the result of interest, that the randomization has been done properly, that they have been treated to control the information biases through masking, that there has been an adequate follow-up with control of the losses and that the analysis has been correct (analysis by intention of treat and control of covariates not balanced by randomization).

A very simple tool that can also help us assess the internal validity of a clinical trial is the Jadad’s scale, also called the Oxford’s quality scoring system. Jadad, a Colombian doctor, devised a scoring system with 7 questions. First, 5 questions whose affirmative answer adds 1 point:

  1. Is the study described as randomized?
  2. Is the method used to generate the randomization sequence described and is it adequate?
  3. Is the study described as double blind?
  4. Is the masking method described and is it adequate?
  5. Is there a description of the losses during follow up?

Finally, two questions whose negative answer subtracts 1 point:

  1. Is the method used to generate the randomization sequence adequate?
  2. Is the masking method appropriate?

As you can see, the Jadad’s scale assesses the key points that we have already mentioned: randomization, masking and monitoring. A trial is considered a rigorous study from the methodological point of view if it has a score of 5 points. If the study has 3 points or less, we better use it to wrap the sandwich.

We will now proceed to consider the results of the study to gauge its clinical RELEVANCE. It will be necessary to determine the variables measured to see if the trial adequately expresses the magnitude and precision of the results. It is important, once again, not to settle for being inundated with multiple p full of zeros. Remember that the p only indicates the probability that we are giving as good differences that only exist by chance (or, to put it simply, to make a type 1 error), but that statistical significance does not have to be synonymous with clinical relevance.

In the case of continuous variables such as survival time, weight, blood pressure, etc., it is usual to express the magnitude of the results as a difference in means or medians, depending on which measure of centralization is most appropriate. However, in cases of dichotomous variables (live or dead, healthy or sick, etc.) the relative risk, its relative and absolute reduction and the number needed to treat (NNT) will be used. Of all of them, the one that best expresses the clinical efficiency is always the NNT. Any trial worthy of our attention must provide this information or, failing that, the necessary information so that we can calculate it.

But to allow us to know a more realistic estimate of the results in the population, we need to know the precision of the study, and nothing is easier than resorting to confidence intervals. These intervals, in addition to precision, also inform us of statistical significance. It will be statistically significant if the risk ratio interval does not include the value one and that of the mean difference the value zero. In the case that the authors do not provide them, we can use a calculator to obtain them, such as those available on the CASP website.

A good way to sort the study of the clinical importance of a trial is to structure it in these four aspects: Quantitative assessment (measures of effect and its precision), Qualitative assessment (relevance from the clinical point of view), Comparative assessment (see if the results are consistent with those of other previous studies) and Cost-benefit assessment (this point would link to the next section of the critical appraisal that has to do with the applicability of the results of the trial).

To finish the critical reading of a treatment article we will value its APPLICABILITY (also called external validity), for which we will have to ask ourselves if the results can be generalized to our patients or, in other words, if there is any difference between our patients and those of the study that prevents the generalization of the results. It must be taken into account in this regard that the stricter the inclusion criteria of a study, the more difficult it will be to generalize its results, thereby compromising its external validity.

But, in addition, we must consider whether all clinically important outcomes have been taken into account, including side effects and undesirable effects. The measured result variable must be important for the investigator and for the patient. Do not forget that the fact that demonstrating that the intervention is effective does not necessarily mean that it is beneficial for our patients. We must also assess the harmful or annoying effects and study the benefits-costs-risks balance, as well as the difficulties that may exist to apply the treatment in our environment, the patient’s preferences, etc.

As it is easy to understand, a study can have a great methodological validity and its results have great importance from the clinical point of view and not be applicable to our patients, either because our patients are different from those of the study, because it does not adapt to your preferences or because it is unrealizable in our environment. However, the opposite usually does not happen: if the validity is poor or the results are unimportant, we will hardly consider applying the conclusions of the study to our patients.

To finish, recommend that you use some of the tools available for critical appraisal, such as the CASP templates, or a checklist, such as CONSORT, so as not to leave any of these points without consideration. Yes, all we have talked about is randomized and controlled clinical trials, and what happens if it is nonrandomized trials or other kinds of quasi-experimental studies? Well for that we follow another set of rules, such as those of the TREND statement. But that is another story…

King of Kings

There is no doubt that when doing a research in biomedicine we can choose from a large number of possible designs, all with their advantages and disadvantages. But in such a diverse and populous court, among jugglers, wise men, gardeners and purple flautists, it reigns over all of them the true Crimson King in epidemiology: the randomized clinical trial.

The clinical trial is an interventional analytical study, with antegrade direction and concurrent temporality, and with sampling of a closed cohort with control of exposure. In a trial, a sample of a population is selected and divided randomly into two groups. One of the groups (intervention group) undergoes the intervention that we want to study, while the other (control group) serves as a reference to compare the results. After a given follow-up period, the results are analyzed and the differences between the two groups are compared. We can thus evaluate the benefits of treatments or interventions while controlling the biases of other types of studies: randomization favors that possible confounding factors, known or not, are distributed evenly between the two groups, so that if in the end we detect any difference, this has to be due to the intervention under study. This is what allows us to establish a causal relationship between exposure and effect.

From what has been said up to now, it is easy to understand that the randomized clinical trial is the most appropriate design to assess the effectiveness of any intervention in medicine and is the one that provides, as we have already mentioned, a higher quality evidence to demonstrate the causal relationship between the intervention and the observed results.

But to enjoy all these benefits it is necessary to be scrupulous in the approach and methodology of the trials. There are checklists published by experts who understand a lot of these issues, as is the case of the CONSORT list, which can help us assess the quality of the trial’s design. But among all these aspects, let us give some thought to those that are crucial for the validity of the clinical trial.

Everything begins with a knowledge gap that leads us to formulate a structured clinical question. The only objective of the trial should be to answer this question and it is enough to respond appropriately to a single question. Beware of clinical trials that try to answer many questions, since, in many cases, in the end they do not respond well to any. In addition, the approach must be based on what the inventors of methodological jargon call the equipoise principle, which does not mean more than, deep in our hearts, we do not really know which of the two interventions is more beneficial for the patient (from the ethical point of view, it would be necessary to be anathema to make a comparison if we already know with certainty which of the two interventions is better). It is curious in this sense how the trials sponsored by the pharmaceutical industry are more likely to breach the equipoise principle, since they have a preference for comparing with placebo or with “non-intervention” in order to be able to demonstrate more easily the efficacy of their products.Then we must carefully choose the sample on which we will perform the trial. Ideally, all members of the population should have the same probability not only of being selected, but also of finishing in either of the two branches of the trial. Here we are faced with a small dilemma. If we are very strict with the inclusion and exclusion criteria, the sample will be very homogeneous and the internal validity of the study will be strengthened, but it will be more difficult to extend the results to the general population (this is the explanatory attitude of sample selection). On the other hand, if we are not so rigid, the results will be more similar to those of the general population, but the internal validity of the study may be compromised (this is the pragmatic attitude).

Randomization is one of the key points of the clinical trial. It is the one that assures us that we can compare the two groups, since it tends to distribute the known variables equally and, more importantly, also the unknown variables between the two groups. But do not relax too much: this distribution is not guaranteed at all, it is only more likely to happen if we randomize correctly, so we should always check the homogeneity of the two groups, especially with small samples.

In addition, randomization allows us to perform masking appropriately, with which we perform an unbiased measurement of the response variable, avoiding information biases. These results of the intervention group can be compared with those of the control group in three ways. One of them is to compare with a placebo. The placebo should be a preparation of physical characteristics indistinguishable from the intervention drug but without its pharmacological effects. This serves to control the placebo effect (which depends on the patient’s personality, their feelings towards the intervention, their love for the research team, etc.), but also the side effects that are due to the intervention and not to the pharmacological effect (think, for example, of the percentage of local infections in a trial with medication administered intramuscularly).

The other way is to compare with the accepted as the most effective treatment so far. If there is a treatment that works, the logical (and more ethical) is that we use it to investigate whether the new one brings benefits. It is also usually the usual comparison method in equivalence or non-inferiority studies. Finally, the third possibility is to compare with non-intervention, although in reality this is a far-fetched way of saying that only the usual care that any patient would receive in their clinical situation is applied.

It is essential that all participants in the trial are submitted to the same follow-up guideline, which must be long enough to allow the expected response to occur. All losses that occur during follow-up should be detailed and analyzed, since they can compromise the validity and power of the study to detect significant differences. And what do we do with those that get lost or end up in a different branch to the one assigned? If there are many, it may be more reasonable to reject the study. Another possibility is to exclude them and act as if they had never existed, but we can bias the results of the trial. A third possibility is to include them in the analysis in the branch of the trial in which they have participated (there is always one that gets confused and takes what he should not), which is known as analysis by treatment or analysis by protocol. And the fourth and last option we have is to analyze them in the branch that was initially assigned to them, regardless of what they did during the study. This is called the intention-to-treat analysis, and it is the only one of the four possibilities that allows us to retain all the benefits that randomization had previously provided.

As a final phase, we would have the analyze and compare the data to draw the conclusions of the trial, using for this the association and impact measures of effect that, in the case of the clinical trial, are usually the response rate, the risk ratio (RR), the relative risk reduction (RRR), the absolute risk reduction (ARR) and the number needed to treat (NNT). Let’s see them with an example.

Let’s imagine that we carried out a clinical trial in which we tried a new antibiotic (let’s call it A not to get warm from head to feet) for the treatment of a serious infection of the location that we are interested in studying. We randomize the selected patients and give them the new drug or the usual treatment (our control group), according to what corresponds to them by chance. In the end, we measure how many of our patients fail treatment (present the event we want to avoid).

Thirty six out of the 100 patients receiving drug A present the event to be avoided. Therefore, we can conclude that the risk or incidence of the event in those exposed (Ie) is 0.36. On the other hand, 60 of the 100 controls (we call them the group of not exposed) have presented the event, so we quickly calculate that the risk or incidence in those not exposed (Io) is 0.6.

At first glance we already see that the risk is different in each group, but as in science we have to measure everything, we can divide the risks between exposed and not exposed, thus obtaining the so-called risk ratio (RR = Ie / Io). An RR = 1 means that the risk is equal in the two groups. If the RR> 1 the event will be more likely in the group of exposed (the exposure we are studying will be a risk factor for the production of the event) and if RR is between 0 and 1, the risk will be lower in those exposed. In our case, RR = 0.36 / 0.6 = 0.6. It is easier to interpret RR> 1. For example, a RR of 2 means that the probability of the event is twice as high in the exposed group. Following the same reasoning, a RR of 0.3 would tell us that the event is a third less frequent in the exposed than in the controls. You can see in the attached table how these measures are calculated.

But what we are interested in is to know how much the risk of the event decreases with our intervention to estimate how much effort is needed to prevent each one. For this we can calculate the RRR and the ARR. The RRR is the risk difference between the two groups with respect to the control (RRR = [Ie-Io] / Io). In our case it is 0.4, which means that the intervention tested reduces the risk by 60% compared to the usual treatment.

The ARR is simpler: it is the difference between the risks of exposed and controls (ARR = Ie – Io). In our case it is 0.24 (we ignore the negative sign), which means that out of every 100 patients treated with the new drug there will be 24 fewer events than if we had used the control treatment. But there is still more: we can know how many we have to treat with the new drug to avoid an event by just doing the rule of three (24 is to 100 as 1 is to x) or, easier to remember, calculating the inverse of the ARR. Thus, the NNT = 1 / ARR = 4.1. In our case we would have to treat four patients to avoid an adverse event. The context will always tell us the clinical importance of this figure.

As you can see, the RRR, although it is technically correct, tends to magnify the effect and does not clearly quantify the effort required to obtain the results. In addition, it may be similar in different situations with totally different clinical implications. Let’s see it with another example that I also show you in the table. Suppose another trial with a drug B in which we obtain three events in the 100 treated and five in the 100 controls. If you do the calculations, the RR is 0.6 and the RRR is 0.4, as in the previous example, but if you calculate the ARR you will see that it is very different (ARR = 0.02), with an NNT of 50 It is clear that the effort to avoid an event is much greater (4 versus 50) despite the same RR and RRR.

So, at this point, let me advice you. As the data needed to calculate RRR are the same than to calculate the easier ARR (and NNT), if a scientific paper offers you only the RRR and hide the ARR, distrust it and do as with the brother-in-law who offers you wine and cured cheese, asking him why he does not better put a skewer of Iberian ham. Well, I really wanted to say that you’d better ask yourselves why they don’t give you the ARR and compute it using the information from the article.

So far all that we have said refers to the classical design of parallel clinical trials, but the king of designs has many faces and, very often, we can find papers in which it is shown a little differently, which may imply that the analysis of the results has special peculiarities.

Let’s start with one of the most frequent variations. If we think about it for a moment, the ideal design would be that which would allow us to experience in the same individual the effect of the study intervention and the control intervention (the placebo or the standard treatment), since the parallel trial is an approximation that it assumes that the two groups respond equally to the two interventions, which always implies a risk of bias that we try to minimize with randomization. If we had a time machine we could try the intervention in all of them, write down what happens, turn back the clock and repeat the experiment with the control intervention so we could compare the two effects. The problem, the more alert of you have already imagined, is that the time machine has not been invented yet.

But what has been invented is the cross-over clinical trial, in which each subject is their own control. As you can see in the attached figure, in this type of test each subject is randomized to a group, subjected to the intervention, allowed to undergo a wash-out period and, finally, subjected to the other intervention. Although this solution is not as elegant as that of the time machine, the defenders of cross-trials argue the fact that variability within each individual is less than the interindividual one, with which the estimate can be more accurate than that of the parallel trial and, in general, smaller sample sizes are needed. Of course, before using this design you have to make a series of considerations. Logically, the effect of the first intervention should not produce irreversible changes or be very prolonged, because it would affect the effect of the second. In addition, the washing period must be long enough to avoid any residual effects of the first intervention.

It is also necessary to consider whether the order of the interventions can affect the final result (sequence effect), with which only the results of the first intervention would be valid. Another problem is that, having a longer duration, the characteristics of the patient can change throughout the study and be different in the two periods (period effect). And finally, beware of the losses during the study, which are more frequent in longer studies and have a greater impact on the final results than in parallel trials.

Imagine now that we want to test two interventions (A and B) in the same population. Can we do it with the same trial and save costs of all kinds? Yes, we can, we just have to design a factorial clinical trial. In this type of trial, each participant undergoes two consecutive randomizations: first it is assigned to intervention A or to placebo (P) and, second, to intervention B or placebo, with which we will have four study groups: AB, AP, BP and PP. As is logical, the two interventions must act by independent mechanisms to be able to assess the results of the two effects independently.

Usually, an intervention related to a more plausible and mature hypothesis and another one with a less contrasted hypothesis are studied, assuring that the evaluation of the second does not influence the inclusion and exclusion criteria of the first one. In addition, it is not convenient that neither of the two options has many annoying effects or is badly tolerated, because the lack of compliance with one treatment usually determines the poor compliance of the other. In cases where the two interventions are not independent, the effects could be studied separately (AP versus PP and BP versus PP), but the design advantages are lost and the necessary sample size increases.

At other times it may happen that we are in a hurry to finish the study as soon as possible. Imagine a very bad disease that kills lots of people and we are trying a new treatment. We want to have it available as soon as possible (if it works, of course), so after every certain number of participants we will stop and analyze the results and, in the case that we can already demonstrate the usefulness of the treatment, we will consider the study finished. This is the design that characterizes the sequential clinical trial. Remember that in the parallel trial the correct thing is to calculate previously the sample size. In this design, with a more Bayesian mentality, a statistic is established whose value determines an explicit termination rule, so that the size of the sample depends on the previous observations. When the statistic reaches the predetermined value we see ourselves with enough confidence to reject the null hypothesis and we finish the study. The problem is that each stop and analysis increases the error of rejecting it being true (type 1 error), so it is not recommended to do many intermediate analysis. In addition, the final analysis of the results is complex because the usual methods do not work, but there are others that take into account the intermediate analysis. This type of trial is very useful with very fast-acting interventions, so it is common to see them in titration studies of opioid doses, hypnotics and similar poisons.

There are other occasions when individual randomization does not make sense. Imagine we have taught the doctors of a center a new technique to better inform their patients and we want to compare it with the old one. We cannot tell the same doctor to inform some patients in one way and others in another, since there would be many possibilities for the two interventions to contaminate each other. It would be more logical to teach the doctors in a group of centers and not to teach those in another group and compare the results. Here what we would randomize is the centers to train their doctors or not. This is the trial with group assignment design. The problem with this design is that we do not have many guarantees that the participants of the different groups behave independently, so the size of the sample needed can increase a lot if there is great variability between the groups and little within each group. In addition, an aggregate analysis of the results has to be done, because if it is done individually, the confidence intervals are falsely narrowed and we can find false statistical meanings. The usual thing is to calculate a weighted synthetic statistic for each group and make the final comparisons with it.

The last of the series that we are going to discuss is the community essay, in which the intervention is applied to population groups. When carried out in real conditions on populations, they have great external validity and often allow for cost-efficient measures based on their results. The problem is that it is often difficult to establish control groups, it can be more difficult to determine the necessary sample size and it is more complex to make causal inference from their results. It is the typical design for evaluating public health measures such as water fluoridation, vaccinations, etc.

I’m done now. The truth is that this post has been a bit long (and I hope not too hard), but the King deserves it. In any case, if you think that everything is said about clinical trials, you have no idea of all that remains to be said about types of sampling, randomization, etc., etc., etc. But that is another story…

Regular customers

We saw in a previous post that sample size is very important. The sample should be the right size, neither more nor less. If too large, we are wasting resources, something to keep in mind in modern times. If we use a small sample we will save money, but lose statistical power. This means that it may happen that there is a difference in effect between the two interventions tested in a clinical trial and not be able to recognize it, which we will be just throwing good money equally.

The problem is that sometimes it can be very difficult to get an adequate sample size, needing excessively long periods of time to get the desired size. Well, for these cases, someone with commercial mentality has devised a method that is to include the same participant many times in the trial. It’s like in bars. Better to have a regular clientele who comes many times to the establishment, always easier than to have a very busy parish (which is also desirable).

There are times when the same patient needs the same treatment in repeated occasions. Consider, for example, asthmatics that need bronchodilator treatment repeatedly, or couples undergoing a process of in vitro fertilization, which requires several cycles to succeed.

Although the usual standard in clinical trials is randomizing participants, in these cases we can randomize each participant independently whenever he needs treatment. For example, if we are testing two bronchodilators, we can randomize the same subject to one of two every time he has an asthma attack and needs treatment. This procedure is known as re-randomization and consists, as we have seen, in randomizing situations rather than participants.

This trick is quite correct from a methodological point of view, provided that certain conditions discussed below are met.

The participant enters the trial the first time in the usual way, being randomly assigned to one of two arms of the trial. Subsequently he is followed-up during the appropriate period and the results of the study variables are collected. Once the follow-up period is finished, if the patient requires new treatment, and continues to meet the inclusion criteria of the trial, he is randomized again, repeating this cycle as necessary to achieve the desired sample size.

This mode of recruiting situations instead of participants achieves getting the sample size with a smaller number of participants. For example, if we need 500 participants, we can randomize 500 once, 250 twice, or 200 once and 50 six times. The important thing is that the number of randomizations of each participant cannot be specified previously, but must depend on the need of treatment in every time.

To apply this method correctly you need to meet three requirements. First, patients can only be re-randomized when they have fully completed the follow-up period of the previous procedure. This is logical because, otherwise, the effects of the two treatments would overlap and a biased measure of the effect of the intervention would be obtained.

Second, each new randomization in the same participant should be done independently of the others. In other words, the probability of assignment to each intervention should not depend on previous assignments. Some authors are tempted to use reallocations to balance the two groups, but this can bias comparisons between the two groups.

Third, the participant should receive the same benefit of each intervention. Otherwise, we get a biased estimate of treatment effect.

We see, then, that this is a good way to reach more easily the sample size we want. The problem with this type of design is that the analysis of the results is more complex than that of conventional clinical trial.

Basically, without going into details, there are two methods of analysis of results. The simplest is the unadjusted analysis, in which all interventions, even if they belong to the same participant are treated independently. This model, which is usually expressed by a linear regression model, does not take into account the effect that participants can have on the results.

The other method is adjusted for the effect of patients, which takes into account the correlation between observations of the same participants.

And here we leave for today. We have not talked anything about the mathematical treatment of the adjusted method to avoid burning the reader’s neurons. Suffice it to say that there are several models that have to do with using generalized linear models and mixed-effects models. But that is another story…

The gregarious one

Conventional randomized clinical trial is an individualistic design, in which each participant is randomized to receive the intervention or placebo to measure the outcome variable after each and compare the differences. This single randomization is complemented by the masking process, so that no one knows which group each participant belongs to and it cannot be effects related to this knowledge.

The problem is that there are times when it is not possible to mask the intervention, so that participants know what everyone receives. Suppose we want to study the effect of certain dietary advice in blood pressure levels in a population. We can give or not the recommendations to each participant, but each of them will know if we gave it to them or not, so masking is not possible.

In addition, two facts that can invalidate the comparison of effects with or without the intervention may occur. First, participants can share information between them, with what some in the placebo group would also know the advices and could follow them. Second, it could be difficult for the researchers to treat objectively the participants from both groups, and their recommendations could be directed to the wrong participant in some situations. This is what is known as contamination between groups, very often when we try to study interventions in public health or health promotion programs.

But do not worry ahead of time, because to solve this problem we can fall back on the gregarious cousin of the randomized clinical trial’s family: the cluster randomized trial.

In these trials the unit of randomization is not the individual but groups of individuals. Thinking in the previous example, we could randomize patients from a health center to the intervention group and patients from another center to the control group. This has the advantage that prevents contamination between groups, with the added advantage that participants within each group behave similarly.

For this design to work properly there have to be a sufficient number of groups to allow that the basal characteristics of the components will be balanced by randomization. It’s also mandatory to keep in mind a number of special considerations during the phases of design, analysis and communication of results of cluster trials, since the lack of independence of the participants in each group has major statistical considerations. It may occur that the members of each group have some common characteristics different from those of other groups (selection bias) and also it may be a different distribution of confounding variables within each group.

One problem with this type of design is that it has less power than the equivalent randomized clinical trial, so larger sample sizes are needed in relation to what is call the cluster inflation factor. Furthermore, the number and size of each group and the correlation that may exist between the results of patients within the same group, using intracluster correlation coefficient, must be considered.

Thus, to calculate the sample size we have to multiply the size that would have the standard trial by a factor of study design, which has into account the cluster size and the intracluster correlation coefficient. The formula to calculate it is the following:

N (cluster trial) = Inflation factor x N (standard clinical trial)

Inflation factor = 1 + [(m – 1) x ICC], where m is the cluster size and ICC is the intracluster correlation coefficient.

Here’s an example. Suppose we have been considering a trial and we would need 400 participants for the standard trial to detect certain effect size with the power and desired statistical significance. We estimate the intracluster correlation coefficient is equal to 0.15 and determined that we want to clusters with 30 participants. The sample size required for a cluster randomized trial is

N (cluster trial) = (1 + [(30 – 1) x 0.15]) x 400 = 2140

Rounding off, we need 72 clusters of 30 participants, with a total sample of 2160. As can be seen, about five times the size of conventional trial’s sample.

Another peculiarity of cluster trials is that the analysis phase must take into account the lack of independence among the patients in each group, no matter whether we calculate results individually or we get summary measures at cluster level. This is because if we ignore the lack of independence among participants it will increase the probability of making a type I error and draw the wrong conclusion. To understand this, a p-value of 0.01 can become something more than 0.05 once we consider this effect.

This causes that we cannot use tests like Student’s t test and we have to resort to robust analysis of variance or to the more employed random effects model, which not only takes into account the cluster effect, but enables an estimate and assess the degree of contamination. It also takes into account the heterogeneity by unobserved factors and allows adjusting for covariates that produce imbalances between different groups. One possibility is to make the analysis considering and not considering the effect of clustering and check if the values of significance are different, in which case it supports the fact that we have chosen the right kind of design for our study.

And these are the most important issues that we have keep in mind when conducting a cluster trial. Its main advantage is to avoid contamination between participants, as we saw at the beginning, so they are very useful for assessing strategies to improve health and for educational programs. Its main drawback has been already mentioned: the lower power with the consequent need for much larger sample sizes.

Finally, just to say that all these issues concerning the calculation of sample size and statistical analysis taking into account the effect of clusters should be clearly specified in the communication phase of the test results.

One last advice. If you carry out a cluster trial or the critical reading of a clinical cluster trial, you do not forget to check that the authors have taken into account the peculiarities that we have discussed. To do this you can use the CONSORT statement. This is a checklist of characteristics that must meet the clinical trials, including the specific characteristics of cluster trials. But that is another story…

Intention is what matters

Someone always does not do what he’s told. No matter how simple the approach of a clinical trial seems to be regarding to its participants. They are randomly assigned to one of the two arms of the trial and some have to take the pill A whereas other have to take B, so we can test which one of both is better.

However, there’s always someone who does not do what he has to and takes the pill that not correspond, or doesn’t take any pill at all, or takes it wrong, or withdraws it ahead of the proper time, etc., etc., etc.

And what do we do when it comes to analyzing the results? Common sense tells us that if a participant has been wrong with the assigned treatment we should include him in the group of the pill he actually took (this is called to make a per protocol analysis). Other option is to forget that participant who doesn`t take the treatment. But this attitude is not correct if we want to make an unbiased analysis of the results. If participants begin to change from one group to the other we lose the benefit we obtained by distributing them randomly, and the result can be the come into play of confounding or modifying variables that were balanced between the two groups during randomization.

To avoid this, the right thing is to respect the initial intention of group assignment and analyze the results of the subject being mistaken as if he had taken the treatment correctly assigned. It is what is known as the intention to treat analysis, the only preserving the advantages of randomization.

There’re several reasons why a participant in a trial cannot receive the assigned treatment, in addition to a poor compliance by its part. Here are some.

Sometimes it may be the researcher who makes an erroneous inclusion of the participant in the treatment group. Imagine that, after randomization, we realize that some participants are not eligible for the intervention, either because they have the disease or because we discover that there is a contraindication to surgery, for example. If we are strict, we should include them in the analysis group to which they were assigned, although they have not received the intervention. However, it may be reasonable to exclude them if the causes of exclusion are previously specified in the trial protocol. However, it is important that this is performed by someone who does not know the allocation and results, so participants of both arms of the trial are managed similarly. Anyway, if we want more security, we can do a sensitivity analysis with and without these subjects to see how the results change.

Another problem of this type can result of missing data. The results of all variables, and especially the principal, should be present for all participants, but this is not always the case, so we have to decide what to do with the subjects with any missing data.

Most statistical programs operate with complete data analysis excluding those records of subjects with missing data. This reduces the effective sample size and may bias the results, in addition to reducing the power of the study. Some models, such as mixed longitudinal or Cox regression handle the records with some missing data, but no one can do anything if all the information of a subject is missing. In these cases we can use data imputation in all of its forms, so that we fill the gaps to take advantage of the overall sample according to the intention to treat.

When data imputation is not convenient, one thing we can do is what is called an analysis of extreme cases. This is done by assigning the gaps the best and worst possible outcomes and sees how the results change. So, we’ll get an idea of the maximum potential impact of missing data on the results of the study. In any case, there is no doubt that the best strategy will be to design the study so that the missing data are kept to a minimum.

Anyway, there’s always someone who is mistaken and mess the performance of the trial. What can we do?

One possibility is to use an intention to treat modified analysis. It includes everyone in the assigned group, but it’s allowed to exclude participants like those who never started treatment or who were not considered suitable for the study. The problem is that this opens a door to mask the data as we are interested in and bias the results to our advantage. Therefore, we must be suspicious when these changes were not specified in the trial protocol and are a post hoc decision.

The other possibility is to make the analysis according to treatment received (per protocol analysis). The problem, as we have said, is that the balance of randomization is lost. Also, if those who have been mistaken have some special feature the results of the study may be biased. Moreover, the advantage of analyzing the facts as the really have happened is that we can get a better idea of how treatment can work in real life.

Finally, perhaps the safest thing to do is to perform both analyzes, the per protocol and the intention to treat, and compared the results obtained with each. In these cases it may be that we detect an effect with the per protocol analysis and not with the intention to treat analysis. This may be due to two main causes. First, per protocol analysis may create spurious associations by the lack of the balance of confounders guaranteed by randomization. Second, the intention to treat analysis favors the null hypothesis, so it has less power than the per protocol analysis. Of course, if we detect a significant effect, we will be strengthened if the analysis was by intention to treat.

And here we end for today. We have seen how try to control errors in the assignment to groups in the trial and how we can impute the missing data, which is a fancy way of saying that we invent data where they’re missing. Of course, we can only do that if some conditions are fulfilled. But that’s another story…

The consolation of not being worse

We live in a frantic and highly competitive world. We are continually inundated with messages about how good it is to be the best in this and that. As indeed it is. But most of us soon realize that it is impossible to be the best in everything we do. Gradually, we even realize that it is very hard to be the best at something, and not only in general. In the end, sooner or later, ordinary mortals have to conform to the minimum of not be the worst at what one does.

But this is not that bad. You can’t always be the best and indeed, you certainly do not have to. Consider, for example, we have a great treatment for a very bad disease. This treatment is effective, inexpensive, easy to use and well tolerated. Are we interested in change to another drug?. Probably not. But think now, for example, that it produces an irreversible aplastic anemia in 3% of those who take it. In this case we would like to find a better treatment.

Better?. Well, not really better. If only it were the same in all but except the production of aplasia, we’d change to the new treatment.

The most common goal of clinical trials is to show the superiority of an intervention against a placebo or the standard treatment. But, increasingly, trials are performed with the sole objective to show that the new treatment is equal to the current. The planning of these equivalence trials should be careful and paying attention to a number of aspects.

First, there is no equivalence from an absolute point of view, so you must take much care in keeping the same conditions in both arms of the trial. In addition, we must first set the sensitivity level that we will need in the study. To do this, we first define the margin of equivalence, which is the maximum difference between the two interventions to be considered acceptable from a clinical point of view. Second, we will calculate the sample size needed to discriminate the difference from the point of view of statistical significance.

It is important to understand that the margin of equivalence is marked by the investigator based on the clinical significance of what is being valued. The narrower the margin, the larger the needed sample size to achieve statistical significance and reject the null hypothesis that the differences we observe are due to chance. Contrary to what may seem at first sight, equivalence studies usually require larger samples than studies of superiority.

After obtaining the results, we’ll analyze the confidence intervals of the differences in effect between the two interventions. Only those intervals not crossing the line of no-effect (one for relative risks and odds ratio and zero for mean differences) are statistically significant. If they are also included within the predefined equivalence margins, they will be considered equivalents with the probability of error chosen for the confidence interval, usually 5%. If an interval falls outside the range of equivalency, the intervention is considered not equivalent. In the case of crossing any of the limits of the margin of equivalence, the study is not conclusive as to prove or reject the equivalence of the two interventions, although we should assess the extent and distribution of the interval regarding to the margins of equivalence to rate its possible relevance from a clinical point of view. Sometimes, not statistically significant results or those outside the equivalence range limits may also provide useful clinical information.

equivalencyLook at the example of the figure to better understand what we have said so far. We have the intervals of nine studies represented with its position regarding the line of no-effect and the limits of equivalence. Only studies A, B, D, G and H show a statistically significant difference, because they are not crossing the line of no-effect. A’s intervention is superior, whereas H’s is showed inferior. However, only in case of D’s can we conclude equivalence of the two interventions, while B’s and G’s are inconclusive with regard to equivalence.

You can also conclude equivalence of the two interventions of E study. Notice that, although the difference obtained in D is statistically significant, is not to exceed the limits of equivalence: it’s superior to E from the statistical point of view, but it seems that the difference has no clinical relevance.

Besides the studies B and G already mentioned, C, F and I are inconclusive regarding equivalence. However, C will probably not be inferior and F could be Inferior. We could even estimate the probability of these assumptions based on the amount of the intervals that fall within the limits of equivalence.

An important aspect of equivalence studies is the method used to analyze results. We know that the intention to treat analysis is always preferable to the per protocol analysis as it keeps the advantages of randomization of known and unknown variables that may influence the results. The problem is that the intention to treat analysis favors the null hypothesis, minimizing the differences, if any. This is an advantage in superiority studies: finding a difference reinforces de result. However, this is not so advantageous in the case of equivalence studies. Otherwise, the per protocol analysis would tend to increase any difference, but this is not always the case and may vary depending on what motivated the protocol violations, losses or mistakes of assignment between the two arms of the trial. For these reason, it’s usually advised to analyze results in both ways and to check that interventions showed equivalents with both methods. We’ll also take into account losses during study and analyze the information provided by the participants who don’t follow the original protocol.

A particular case of this type of trial is the non-inferiority. In this case, researchers are contented to demonstrate that the new intervention is not worse than the comparison. All we have said about equivalence is valid here, but considering only the lower limit of the range of equivalence.

One last thing. Studies of superiority are to demonstrate superiority and equivalence studies are to demonstrate equivalence. One of the designs is not useful to show the goal of the other. Furthermore, if a study fails to demonstrate superiority, it does not exactly mean that the two procedures are equivalent.

We have reached the end without speaking anything about other characteristic equivalence studies: bioequivalence studies. These are phase I trials conducted by pharmaceutical companies to test the equivalence of different presentations of the same drug, and they have some design specifications. But that’s another story…

The chameleon

What a fascinating reptile. It’s known by its eyes, with its ability to rotate independently covering the whole angle of the circle. Also known is its long tongue, with which it traps from the distance the bug that it eats without moving from its place. But the most famous of the chameleon’s abilities is that of changing color and blending into the environment when it wants to go unnoticed, which is not surprising because the chameleon is, it must be said, a pretty ugly bug.

But today we’re going to talk about clinical trials. About one type of clinical trial in particular: as a true chameleon of epidemiology, it changes its design as it is being performed to suit the circumstances as they occur. I am talking about adaptive clinical trials.

A clinical trial usually has a fixed design or protocol that we must not change and, when changed, we must explain in detail and justify the reason why we did it. However, in an adaptive clinical trial we defined in advance, prospectively, the possibility of changes in one or more aspects of the study design based on data that are obtained during the trial. We usually plan at what time throughout the study we’ll analyze the available data and results to determine if we perform some of the predetermined changes. Otherwise, any change is a violation of the study protocol that jeopardizes the validity of the results.

There’re many changes we can do. We can change the probabilities of the randomization method, the sample size, and even the characteristics of the follow-up, which can be lengthened or shortened, and modify the visits that were planned in the initial design. But we can go further and change the dose of the tested treatment or the allowed or prohibited concomitants medications, depending on our interests.

We can also change aspects such us the inclusion criteria, outcome variables (especially the components of composite variables), the analytical methods of the study and even to transform a superiority trial to a non-inferiority one, or vice versa.

As we have mentioned a couple of times, these changes must be planned in advance.  We have to define the events that will induce us to make the adaptations of the protocol. For instance, we can plan to increase or decrease the sample size to improve power after enrolling a number of participants, or to include some groups during a predetermined follow-up and, from there, not to implement the intervention with the group in which it is no effective.

The advantages of an adaptive design are obvious. First, flexibility is evident. The other two are more theoretical and are not always met but, a priori, they are more efficient than conventional designs and are more likely to demonstrate the effect of the intervention, if it exists.

Its main drawback is the difficulty of planning a priori all the possibilities of change and the subsequent interpretation of the results. It’s difficult to interpret final results when the course of the trial depends heavily on the intermediate data being obtained. Moreover, this makes it imperative to have a fast and easy access to study data while performing it, which can be difficult in the context of a clinical trial.

And here we end up for today. I insist on the need of the a priori planning of trial protocol and, in the case of adaptive designs, of each adaptive condition. As a matter of fact, nowadays most clinical trials are registered before performing for the recording of their design conditions. Of course, this also facilitates the posterior study publication, even if the results are not favorable, which helps to combat publication bias. But that’s another story…

About pilots

No doubt that the randomized clinical trial is the King of epidemiological designs when we want to show, for instance, the effectiveness of a treatment. The problem is that clinical trials are difficult and expensive to perform, so before we get into a trial it is usual to carry out other previous studies.

These previous studies may be of the observational type. With a cohort or a case-control study we can gather enough information about the effect of an intervention to justify the subsequent performance of a clinical trial.

However, observational studies are also expensive and complexes, so we often resort to another solution: doing a clinical trial on a smaller scale to obtain evidence in order to do or not to do a large-scale trial, which results would be definitive. These previous studies are generally known by the name of pilot studies, and they have a number of characteristics that should be taken into account.

For example, the aim of a pilot study is to provide some assurance that the effort of making the final trial will provide something useful, so it tries more to observe the type of intervention’s effect than to demonstrate its effectiveness.

Being relatively small studies, pilot studies often lack of sufficient power to achieve statistical significance at the usual level of 0.05, so some authors recommend setting the value of alpha at 0.2. This alpha-value is the chance we have of making a type I error, which is to reject the null hypothesis of no-effect when it’s true or, in other words, accepting the existence of an effect that doesn’t really exist.

But, what is going on? Don’t we mind to have a 20% chance of being wrong?. For other trial the acceptable limit is 5%. Well, the true isn’t that we don’t mind, but the point of view with a pilot study is different of the one with a conventional clinical trial.

If we commit a type I error doing a conventional clinical trial, we’ll admit a treatment as effective when it’s not. It’s easy to understand that this can carry bad consequences and harm patients who undergo in the future to the alleged beneficial intervention. However, if we make a type I error in a pilot study, all that will happens is that we’ll spend time and money to make a definitive trial that finally will prove that the treatment is not effective.

In a definitive clinical trial is preferable not to take for effective an ineffective or unsafe treatment, while in a pilot study is preferable to perform a bigger clinical trial with an ineffective treatment than not to test one that could be effective. This is why the threshold of type I error is increased to 0.2.

Anyway, if we are interested in study the direction of the intervention’s effect, it may be advisable to use confidence intervals instead of classical hypothesis testing with its p-values.

These confidence intervals have to be compared with the minimal clinically important difference, which must be defined a priori. If the interval doesn’t include the null value and includes the minimal important difference, we’ll have arguments for conducting a large-scale trial to definitively show the effect. Suffice is to say that, as we can increase the alpha-value, we can use confidence intervals with levels below 95%.

Another peculiarity of pilot studies is the choice of the outcome variables. Considering that a pilot study seeks to test just how the components of the trial will work together in the future trial, we can understand that sometimes it’s impractical to use an outcome variable and we have to use a surrogate variable, that is one which provides an indirect measure of the effect when the direct measurement is not practical or impossible. For example, if we’re studying an antitumor treatment, the outcome variable may be the five-year survival, but in a pilot study may be more useful an indirect variable who indicates the decrease in tumor size. It will indicate the direction of treatment’s effect without prolonging the pilot study for too long.

So, you can see that pilot studies should be interpreting taking into account their peculiarities. Moreover, they also help us to predict how the definitive trial can function, anticipating problems that could ruin an expensive and complex clinical trial. This is the case of missing data and losses to follow-up, which are usually larger in pilot studies than in conventional trials. Although they have less significance, losses in pilot studies should be evaluated trying to prevent future losses in the final trial because, although there’re many ways to manage losses and missing data, the best way is always to prevent their occurrence. But that’s another story…

The most wished statistical for a mother

Those of you who’re reading and who belong to the pediatricians’ gang already know what I am talking about: the 50th percentile. There’s no mother who doesn’t want her offspring to be above it in weight, height, intelligence and everything else that a good mother could desired for her child. That’s why pediatricians, who dedicate our lives to children care, love percentiles so much. But what is the meaning of the term percentile?. Let’s start from the beginning…

If we have the distribution of values of a variable we can summarize it with a central and a dispersion measure. The most common are the mean and the standard deviation, respectively, but sometimes we use other measures of central tendency (such us the median or the mode) and of dispersion.

The simplest of these other measures of dispersion is called range, which is defined as the difference between the maximum and minimum values of the distribution.

Let’s suppose that we collect the birth weights of the last 100 children born at our hospital and we order them as they appear in the table. The lowest value was 2200 grams, while the prize for the biggest goes to an infant who weighed 4000 grams. The range in this case is 1800 grams but, of course, if we do not have the table and someone tell us just this, we couldn’t have much idea about how our babies are in size. This is why it’s usually preferred to give the range with explicit minimum and maximum values. In our case it would be from 2200 to 4000 grams.

If you remember how to calculate the median, you will see that it values 3050 grams. To complete the picture we need a measure that tells us how the rest of the weights are distributed around the median and within the range.

percentilesThe easiest way is to divide the distribution in four equal segments including 25% of children each one. These segments are called quartiles, and there’re three of them: the first quartile (at 25% above the minimum), the second quartile (which is the same as the median) and the third quartile (at 75%, between the median and the maximum). We come up with four segments: from the minimum to the first quartile, from the first to the second (median), from second to third and from third to the maximum. In our case, the three quartiles would be 2830, 3050 and 3200 grams. Some people call these the lower quartile, the median and the upper quartile, but they are the same thing.

Now if we know that the median is 3050 grams and 50% of children weight between 2830 and 3200 grams, we’ll have a pretty good idea about the birth weights of our newborns. This interval is called the interquartile range and it’s usually provided along with the median to summarize the distribution. In our example: a median of 3050 grams with an interquartile range from 2830-3200 grams.

But we can go much further. We can divide the distribution in the number of segments we want. The deciles are the result of dividing it in ten segments and our revered percentiles the result of dividing it in a hundred.

There is a fairly simple formula to calculate any percentile we want. For example, the Pth percentile would be at position (P/100)x(n+1), where n represents the sample size. In our distribution of neonates, the 22nd percentile would be (22/100)x(100+1) = 22.2, i.e. 2770 grams.

The sharpest of you may have noticed that our 3050 grams correspond not only to the median, but also to the fifth decile and to the 50th percentile, the desired one.

The great use of percentile, apart from to give satisfaction to 50% of mothers (those who have their children above the median), is to allow us to estimate the probability of a certain value of the variable within the population. In general, the closer you are to the median the better it be (at least in medicine), and the further away from it the more likely that someone take you to a doctor to find out why you are not closer to the precious percentile or, even, something above it.

But if we really want to further refine the calculation of the probability to obtain a particular value within a data distribution, there’re other techniques related with the standardization of the dispersion measure we use. But that’s another story…

To see well you must be blind

It’s said that there’s none so blind than those that refuse to see. But it’s also true that wanting to see too much can be counterproductive. Sometimes it is better to see just the essential and indispensable.

That’s what happens with scientific studies. Imagine that we want to test a new treatment and we propose a trial to some people, giving the new drug to some of them and a placebo to the rest. If we all know what is treated each with, it might be that researchers or participants expectations influence, even inadvertently, the way we evaluate the results of the study. This is why you have to use masking techniques, better known as blinding.

Let’s suppose we want to test a new drug for treating a very severe disease. If a participant knows he’s receiving the drug he will be much more tolerant with side effects than if he receives placebo. And something similar can happen to the researcher. It’s easy to imagine that you would take less interest in asking for a toxicity sign to a patient that you know is being treated with a harmless placebo.

All of these facts may influence the way participants and researchers evaluated the effects of treatment and may lead to a bias in interpreting results.

Masking techniques can be performed at different levels. The lowest level is not masking at all, making what is called and open or un-blinded trial. Although masking is the ideal thing to do, there’re times when it’s not possible or convenient. For example, think about the case you need to cause unnecessary inconvenience to the patient, such as administering an intravenous placebo for a long time or doing a sham surgical procedure. Other times it’s difficult to find a placebo galenicaly indistinguishable from the drug tested. And finally, sometimes it doesn’t make much sense to blind if treatment produces easily recognizable effects that don’t occur with placebo.

The next level is the single-blind, when either participants or researchers don’t know which treatment is receiving each one of them. A further step is the double-blind, in which neither researchers nor participants know which group each one is assigned to. And finally, we can do a triple-blinding when, in addition to the aforementioned, the person who analyze the data or who has the responsibility to control and stop the study also unknowns which group each participant is assigned to. Imagine someone has a serious adverse effect and we have to decide if we must stop the study. No doubt that knowing if that person is receiving the drug or placebo can influence our decision.

But what can we do when masking is not possible or is inconvenient?. For such cases we have no more choice than to make an open or un-blinded study, although we can try to use a blind evaluator. This means that, although researchers and participants know the allocations to placebo or intervention groups, the person who analyzes the results doesn’t know it. This is especially important when the outcome variable is a subjective one. By the way, it’s not so essential when we measure objective variables, such as a laboratory determination. Think that you won’t assess an X-ray film with the same detail or criteria if you know that the individual comes from the placebo or the intervention group.

To end this post, we are going to discuss two other possible errors resulting from lack of blinding. If a participant knows he’s receiving the studied drug he can improve just by a placebo effect. On the other hand, if he knows he’s in the placebo arm, he can modify his behavior just because he knows “he’s not protected” by the new treatment. This is called contamination and it’s a real problem in studies about lifestyle habits.

And that’s all. Just to clarify a concept before the end. We have seen that there is some relationship between lack of blinding and the appearance of a placebo effect. But don’t be mistaken, masking is not the way to control the placebo effect. For that we have to resort to another trick: randomization. But that’s another story…