The King under review

Critica appraisal of treatment studies

We all know that the randomized clinical trial is the king of interventional methodological designs. It is the type of epidemiological study that allows a better control of systematic errors or biases, since the researcher controls the variables of the study and the participants are randomly assigned among the interventions that are compared.

In this way, if two homogeneous groups that differ only in the intervention present some difference of interest during the follow-up, we can affirm with some confidence that this difference is due to the intervention, the only thing that the two groups do not have in common. For this reason, the clinical trial is the preferred design to answer clinical questions about intervention or treatment, although we will always have to be prudent with the evidence generated by a single clinical trial, no matter how well performed. When we perform a systematic review of randomized clinical trials on the same intervention and combine them in a meta-analysis, the answers we get will be more reliable than those obtained from a single study. That’s why some people say that the ideal design for answering treatment questions is not the clinical trial, but the meta-analysis of clinical trials.

In any case, as systematic reviews assess their primary studies individually and as it is more usual to find individual trials and not systematic reviews, it is advisable to know how to make a good critical appraisal in order to draw conclusions. In effect, we cannot relax when we see that an article corresponds to a clinical trial and take its content for granted. A clinical trial can also contain its traps and tricks, so, as with any other type of design, it will be a good practice to make a critical reading of it, based on our usual three pillars: validity, importance and applicability.

Critical appraisal of treatment studies

As always, when studying scientific rigor or VALIDITY (internal validity), we will first look at a series of essential primary criteria. If these are not met, it is better not to waste time with the trial and try to find another more profitable one.

Is there a clearly defined clinical question? In its origin, the trial must be designed to answer a structured clinical question about treatment, motivated by one of our multiple knowledge gaps. A working hypothesis should be proposed with its corresponding null and alternative hypothesis, if possible on a topic that is relevant from the clinical point of view. It is preferable that the study try to answer only one question. When you have several questions, the trial may get complicated in excess and end up not answering any of them completely and properly.

Was the assignment done randomly? As we have already said, to be able to affirm that the differences between the groups are due to the intervention, they must be homogeneous. This is achieved by assigning patients randomly, the only way to control the known confounding variables and, more importantly, also those that we do not know. If the groups were different and we attributed the difference only to the intervention, we could incur in a confusion bias. The trial should contain the usual and essential table 1 with the frequency of appearance of the demographic and confusion variables of both samples to be sure that the groups are homogeneous. A frequent error is to look for the differences between the two groups and evaluate them according to their p, when we know that p does not measure homogeneity. If we have distributed them at random, any difference we observe will necessarily be random (we will not need a p to know that). The sample size is not designed to discriminate between demographic variables, so a non-significant p may simply indicate that the sample is small to reach statistical significance. On the other hand, any minimal difference can reach statistical significance if the sample is large enough. So forget about the p: if there is any difference, what you have to do is assess whether it has sufficient clinical relevance to have influenced the results or, more elegantly, we will have to control the unbalanced covariates during the randomization. Fortunately, it is increasingly rare to find the tables of the study groups with the comparison of p between the intervention and control groups.

But it is not enough for the study to be randomized, we must also consider whether the randomization sequence was done correctly. The method used must ensure that all components of the selected population have the same probability of being chosen, so random number tables or computer generated sequences are preferred. The randomization must be hidden, so that it is not possible to know which group the next participant will belong to. That is why people like centralized systems by telephone or through the Internet. And here is something very curious: it turns out that it is well known that randomization produces samples of different sizes, especially if the samples are small, which is why samples randomized by blocks balanced in size are sometimes used. And I ask you, how many studies have you read with the same number of participants in the two branches and who claimed to be randomized? Do not trust if you see equal groups, especially if they are small, and do not be fooled: you can always use one of the multiple binomial probability calculators available on the Internet to know what is the probability that chance generates the groups that the authors present (we always speak of simple randomization, not by blocks, conglomerates, minimization or other techniques). You will be surprised with what you will find.

It is also important that the follow-up has been long and complete enough, so that the study lasts long enough to be able to observe the outcome variable and that every participant who enters the study is taken into account at the end. As a general rule, if the losses exceed 20%, it is admitted that the internal validity of the study may be compromised.

We will always have to analyze the nature of losses during follow-up, especially if they are high. We must try to determine if the losses are random or if they are related to any specific variable (which would be a bad matter) and estimate what effect they may have on the results of the trial. The most usual is usually to adopt the so-called worst-case scenarios: it is assumed that all the losses of the control group have gone well and all those in the intervention group have gone badly and the analysis is repeated to check if the conclusions are modified, in which case the validity of the study would be seriously compromised. The last important aspect is to consider whether patients who have not received the previously assigned treatment (there is always someone who does not know and mess up) have been analyzed according to the intention of treatment, since it is the only way to preserve all the benefits that are obtained with randomization. Everything that happens after the randomization (as a change of the assignment group) can influence the probability that the subject experiences the effect we are studying, so it is important to respect this analysis by intention to treat and analyze each one in the group in which it was initially assigned.

Once these primary criteria have been verified, we will look at three secondary criteria that influence internal validity. It will be necessary to verify that the groups were similar at the beginning of the study (we have already talked about the table with the data of the two groups), that the masking was carried out in an appropriate way as a form of control of biases and that the two groups were managed and controlled in a similar way except, of course, the intervention under study. We know that masking or blinding allows us to minimize the risk of information bias, which is why the researchers and participants are usually unaware of which group is assigned to each, which is known as double blind. Sometimes, given the nature of the intervention (think about a group that is operated on and another one that does not) it will be impossible to mask researchers and participants, but we can always give the masked data to the person who performs the analysis of the results (the so-called blind evaluator), which ameliorate this incovenient.

To summarize this section of validity of the trial, we can say that we will have to check that there is a clear definition of the study population, the intervention and the result of interest, that the randomization has been done properly, that they have been treated to control the information biases through masking, that there has been an adequate follow-up with control of the losses and that the analysis has been correct (analysis by intention of treat and control of covariates not balanced by randomization).

A famous Colombian: Alejandro Jadad Bechara

A very simple tool that can also help us assess the internal validity of a clinical trial is the Jadad’s scale, also called the Oxford’s quality scoring system. Jadad, a Colombian doctor, devised a scoring system with 7 questions. First, 5 questions whose affirmative answer adds 1 point:

  1. Is the study described as randomized?
  2. Is the method used to generate the randomization sequence described and is it adequate?
  3. Is the study described as double blind?
  4. Is the masking method described and is it adequate?
  5. Is there a description of the losses during follow up?

Finally, two questions whose negative answer subtracts 1 point:

  1. Is the method used to generate the randomization sequence adequate?
  2. Is the masking method appropriate?

As you can see, the Jadad’s scale assesses the key points that we have already mentioned: randomization, masking and monitoring. A trial is considered a rigorous study from the methodological point of view if it has a score of 5 points. If the study has 3 points or less, we better use it to wrap the sandwich.

We will now proceed to consider the results of the study to gauge its clinical RELEVANCE. It will be necessary to determine the variables measured to see if the trial adequately expresses the magnitude and precision of the results. It is important, once again, not to settle for being inundated with multiple p full of zeros. Remember that the p only indicates the probability that we are giving as good differences that only exist by chance (or, to put it simply, to make a type 1 error), but that statistical significance does not have to be synonymous with clinical relevance.

In the case of continuous variables such as survival time, weight, blood pressure, etc., it is usual to express the magnitude of the results as a difference in means or medians, depending on which measure of centralization is most appropriate. However, in cases of dichotomous variables (live or dead, healthy or sick, etc.) the relative risk, its relative and absolute reduction and the number needed to treat (NNT) will be used. Of all of them, the one that best expresses the clinical efficiency is always the NNT. Any trial worthy of our attention must provide this information or, failing that, the necessary information so that we can calculate it.

But to allow us to know a more realistic estimate of the results in the population, we need to know the precision of the study, and nothing is easier than resorting to confidence intervals. These intervals, in addition to precision, also inform us of statistical significance. It will be statistically significant if the risk ratio interval does not include the value one and that of the mean difference the value zero. In the case that the authors do not provide them, we can use a calculator to obtain them, such as those available on the CASP website.

A good way to sort the study of the clinical importance of a trial is to structure it in these four aspects: Quantitative assessment (measures of effect and its precision), Qualitative assessment (relevance from the clinical point of view), Comparative assessment (see if the results are consistent with those of other previous studies) and Cost-benefit assessment (this point would link to the next section of the critical appraisal that has to do with the applicability of the results of the trial).

To finish the critical reading of a treatment article we will value its APPLICABILITY (also called external validity), for which we will have to ask ourselves if the results can be generalized to our patients or, in other words, if there is any difference between our patients and those of the study that prevents the generalization of the results. It must be taken into account in this regard that the stricter the inclusion criteria of a study, the more difficult it will be to generalize its results, thereby compromising its external validity.

But, in addition, we must consider whether all clinically important outcomes have been taken into account, including side effects and undesirable effects. The measured result variable must be important for the investigator and for the patient. Do not forget that the fact that demonstrating that the intervention is effective does not necessarily mean that it is beneficial for our patients. We must also assess the harmful or annoying effects and study the benefits-costs-risks balance, as well as the difficulties that may exist to apply the treatment in our environment, the patient’s preferences, etc.

As it is easy to understand, a study can have a great methodological validity and its results have great importance from the clinical point of view and not be applicable to our patients, either because our patients are different from those of the study, because it does not adapt to your preferences or because it is unrealizable in our environment. However, the opposite usually does not happen: if the validity is poor or the results are unimportant, we will hardly consider applying the conclusions of the study to our patients.

We’re leaving…

To finish, recommend that you use some of the tools available for critical appraisal, such as the CASP templates, or a checklist, such as CONSORT, so as not to leave any of these points without consideration. Yes, all we have talked about is randomized and controlled clinical trials, and what happens if it is nonrandomized trials or other kinds of quasi-experimental studies? Well for that we follow another set of rules, such as those of the TREND statement. But that is another story…

Regular customers

Re-randomization in clinical trials

We saw in a previous post that sample size is very important. The sample should be the right size, neither more nor less. If too large, we are wasting resources, something to keep in mind in modern times. If we use a small sample we will save money, but lose statistical power. This means that it may happen that there is a difference in effect between the two interventions tested in a clinical trial and not be able to recognize it, which we will be just throwing good money equally.

When sample size is out of our reach

The problem is that sometimes it can be very difficult to get an adequate sample size, needing excessively long periods of time to get the desired size. Well, for these cases, someone with commercial mentality has devised a method that is to include the same participant many times in the trial. It’s like in bars. Better to have a regular clientele who comes many times to the establishment, always easier than to have a very busy parish (which is also desirable).

There are times when the same patient needs the same treatment in repeated occasions. Consider, for example, asthmatics that need bronchodilator treatment repeatedly, or couples undergoing a process of in vitro fertilization, which requires several cycles to succeed.

Re-randomization in clinical trials

Although the usual standard in clinical trials is randomizing participants, in these cases we can randomize each participant independently whenever he needs treatment. For example, if we are testing two bronchodilators, we can randomize the same subject to one of two every time he has an asthma attack and needs treatment. This procedure is known as re-randomization and consists, as we have seen, in randomizing situations rather than participants.

This trick is quite correct from a methodological point of view, provided that certain conditions discussed below are met.

The participant enters the trial the first time in the usual way, being randomly assigned to one of two arms of the trial. Subsequently he is followed-up during the appropriate period and the results of the study variables are collected. Once the follow-up period is finished, if the patient requires new treatment, and continues to meet the inclusion criteria of the trial, he is randomized again, repeating this cycle as necessary to achieve the desired sample size.

This mode of recruiting situations instead of participants achieves getting the sample size with a smaller number of participants. For example, if we need 500 participants, we can randomize 500 once, 250 twice, or 200 once and 50 six times. The important thing is that the number of randomizations of each participant cannot be specified previously, but must depend on the need of treatment in every time.

Three conditions

To apply this method correctly you need to meet three requirements. First, patients can only be re-randomized when they have fully completed the follow-up period of the previous procedure. This is logical because, otherwise, the effects of the two treatments would overlap and a biased measure of the effect of the intervention would be obtained.

Second, each new randomization in the same participant should be done independently of the others. In other words, the probability of assignment to each intervention should not depend on previous assignments. Some authors are tempted to use reallocations to balance the two groups, but this can bias comparisons between the two groups.

Third, the participant should receive the same benefit of each intervention. Otherwise, we get a biased estimate of treatment effect.

We see, then, that this is a good way to reach more easily the sample size we want. The problem with this type of design is that the analysis of the results is more complex than that of conventional clinical trial.

Basically, without going into details, there are two methods of analysis of results. The simplest is the unadjusted analysis, in which all interventions, even if they belong to the same participant are treated independently. This model, which is usually expressed by a linear regression model, does not take into account the effect that participants can have on the results.

The other method is adjusted for the effect of patients, which takes into account the correlation between observations of the same participants.

We’re leaving…

And here we leave for today. We have not talked anything about the mathematical treatment of the adjusted method to avoid burning the reader’s neurons. Suffice it to say that there are several models that have to do with using generalized linear models and mixed-effects models. But that is another story…

The necessity of chance

Democritus said that everything that exists in this world is the result of chance and necessity. And Monod, who thought the same, made use of the way chance interweaves with our destiny to explain that we are no more than genetic machines. But today we are not going to talk about chance and its need to understand our mechanistic evolution, but about something quite different, although it may seems a crosswords puzzle: the need to use chance when designing scientific studies to control what is beyond our control.

And, indeed, randomization is one of the key elements of experimental studies. Whenever we’re planning a clinical trial to test the effectiveness of an intervention we need that the two groups, the intervention and the control group, are fully comparable, as it’s the only way to be reasonably sure that the differences we observe are the result of the intervention. Well, this allocation of participants to one of the two groups must be done randomly, without the intervention of the participant’s or the researcher’s will.

The great advantage of randomization is that it evenly distributes the variable that can influence the outcome, whether they are known or unknown to the researcher. Thus, we can state our null and alternative hypothesis and calculate the probability that observed differences are due to chance or to the effect of the intervention under study.

However, all of its advantages may be lost if we don’t randomize correctly. It is very important that randomization sequence is concealed and unpredictable, so it is impossible to know which group the next participant is going to be allocated to, even before deciding his inclusion in the study (to avoid that this knowledge can influence the decision to participate in the study).

It’s often performed by using sealed envelopes with hidden codes that are assigned to participants. Another possibility is to use hardware random number generators or random number tables. For the sake of security, it’s also desirable that randomization is made by people other than the study’s, in a centralized way or by telephone. In any case, we must avoid techniques that can be predictable, as the use of the days of the week, the name’s initials, birth dates, etc.

There are several techniques for properly randomize, all having in common the fact that participants have a certain probability of being allocated to any of the test groups.

A very simple method is to allocate them alternately and systematically to one group or the other, by this method is only random for the first participant allocated. This is why it is often preferred to use other techniques of randomization.

The simplest way of randomized is called (no surprise) simple random allocation. It’s equivalent to toss a coin, having all the participants the same probability to be allocated to any of the two groups. But this is not always the case, because we can change probabilities and assign a different one to control and intervention groups. The problem with this method is that it creates groups of different size, so it may appear imbalances between groups, especially with small samples.

To avoid this problem we can do a block randomization, performing allocation to blocks of predetermined size (multiple of two) and assigning half of the participants to one group and the rest to the other. This ensures a similar number of participants in each group.

We can also divide the sample in groups based on a prognostic variable and make a random allocation within each group. This technique is called stratified randomization. It is important that strata are mutually exclusionary, as much different as possible from each other and as homogeneous as possible inside the strata. Some people recommend using block randomization within each stratum, but this may depend on the type of study.

Participants can also be allocated based on different functional or geographical groups to avoid contamination of some participants with the intervention of the opposite arm of the study. Let’s think that we want to test a cancer screening technique. It may be better to screen in some centers and not to screen in the others. If we do both arms at the same center, the control group participants can modify their lifestyle or require the benefit of the screening for them too.

Finally, there’re also a number of adaptive randomization techniques that changes throughout the study to adapt to emerging imbalances in the distribution of variables or in the number of subjects in each group. These techniques can also be used when we are interested in minimizing the number of those receiving the less effective intervention, once we know some of the results of the study.

And we’re concluding this topic. Before ending I only want to warn you not to mistake concealed randomization sequence with masking. Randomization prevent selection bias and ensure (although not always) a balanced distribution of confounder and effect modifiers. Masking is done after allocation has taken place and prevents information bias. But that’s another story…

To see well you must be blind

It’s said that there’s none so blind than those that refuse to see. But it’s also true that wanting to see too much can be counterproductive. Sometimes it is better to see just the essential and indispensable.

That’s what happens with scientific studies. Imagine that we want to test a new treatment and we propose a trial to some people, giving the new drug to some of them and a placebo to the rest. If we all know what is treated each with, it might be that researchers or participants expectations influence, even inadvertently, the way we evaluate the results of the study. This is why you have to use masking techniques, better known as blinding.

Let’s suppose we want to test a new drug for treating a very severe disease. If a participant knows he’s receiving the drug he will be much more tolerant with side effects than if he receives placebo. And something similar can happen to the researcher. It’s easy to imagine that you would take less interest in asking for a toxicity sign to a patient that you know is being treated with a harmless placebo.

All of these facts may influence the way participants and researchers evaluated the effects of treatment and may lead to a bias in interpreting results.

Masking techniques can be performed at different levels. The lowest level is not masking at all, making what is called and open or un-blinded trial. Although masking is the ideal thing to do, there’re times when it’s not possible or convenient. For example, think about the case you need to cause unnecessary inconvenience to the patient, such as administering an intravenous placebo for a long time or doing a sham surgical procedure. Other times it’s difficult to find a placebo galenicaly indistinguishable from the drug tested. And finally, sometimes it doesn’t make much sense to blind if treatment produces easily recognizable effects that don’t occur with placebo.

The next level is the single-blind, when either participants or researchers don’t know which treatment is receiving each one of them. A further step is the double-blind, in which neither researchers nor participants know which group each one is assigned to. And finally, we can do a triple-blinding when, in addition to the aforementioned, the person who analyze the data or who has the responsibility to control and stop the study also unknowns which group each participant is assigned to. Imagine someone has a serious adverse effect and we have to decide if we must stop the study. No doubt that knowing if that person is receiving the drug or placebo can influence our decision.

But what can we do when masking is not possible or is inconvenient?. For such cases we have no more choice than to make an open or un-blinded study, although we can try to use a blind evaluator. This means that, although researchers and participants know the allocations to placebo or intervention groups, the person who analyzes the results doesn’t know it. This is especially important when the outcome variable is a subjective one. By the way, it’s not so essential when we measure objective variables, such as a laboratory determination. Think that you won’t assess an X-ray film with the same detail or criteria if you know that the individual comes from the placebo or the intervention group.

To end this post, we are going to discuss two other possible errors resulting from lack of blinding. If a participant knows he’s receiving the studied drug he can improve just by a placebo effect. On the other hand, if he knows he’s in the placebo arm, he can modify his behavior just because he knows “he’s not protected” by the new treatment. This is called contamination and it’s a real problem in studies about lifestyle habits.

And that’s all. Just to clarify a concept before the end. We have seen that there is some relationship between lack of blinding and the appearance of a placebo effect. But don’t be mistaken, masking is not the way to control the placebo effect. For that we have to resort to another trick: randomization. But that’s another story…