## Twisting parallels

Mathematicians say that two parallel lines are those which never meet, no matter how far extended. Never? It seems to me that never is too much time. I don’t think anyone has ever extended two parallels long enough to be sure of this statement. Of course, on the other hand, if they end converging it’s that they were not parallels at all, weren’t they?

Those that can converge, and even cross, are the two branches of a parallel trial, resulting in a new design called crossover trial.

In a classical parallel clinical trial each participant is randomly assigned to one, and only one, of the arms of the trial, to that of the studied intervention or to the control group. However, we can twist the parallels and get a design that allows every patient to receive both the studied intervention and the control one, though, always in a predefined order and for a determined period of time.

Thus, each subject serves as his own control, experiencing both interventions in a sequence of periods set in a random way and being the two periods separated by a stabilization or washout period. You can see a diagram of this design in the attached figure.

There are some variations on the theme of crossover trials, depending on the number of participants under the two interventions: all of them (complete trial) or only some of them (incomplete trial). Furthermore, it is possible to extend this type of design and test more than two interventions, leading to different levels of sequence that are named as dual design, Balaam, Latin square, etc., but we’re not going to talk about them in this post.

The main advantage of the crossover studies lies in a feature already discussed: each subject acts as his own control. This may seem unimportant bullshit, but it is not. If you think about it, what we do is to assess the effect of active intervention and control in the same subject, which will get less variability than if we compare the effects on different participants, as is done in the trial in parallel wherein each participant is exposed to only one of the two interventions.

With less variability, the accuracy of the observations is increased, whereby the sample size required to detect a specified difference in treatment effect will decrease. And not a little lower, but the sample required can be reduced significantly compared with what would be needed in the corresponding parallel trial.

This reduction in sample size depends on the correlation between the different outcomes of the study. In the worst case, with zero correlation, the sample is split in half. If the correlation is 0.5, the necessary sample shall be one quarter. But this reduction is increasing as the value of the correlation approaching to one.

As if that were not enough, along with a more accurate estimate, this is less biased, since it assumes a constant response of each subject to the two interventions tested, while testing in parallel trial the measured response is more variable in different subjects.

But not everything will be advantages for crossover designs. They also raise some drawbacks. The biggest limitation is the pain in the ass that is given to participants with so much intervention and so many periods. And this is important not only for the respect that we should feel for the participants, but it increases the risk of losses during the study. It turns out that crossover studies are more sensitive to losses during follow-up than parallel trials, especially if the number of participants who complete each sequence is different.

Another limitation is that it’s important that subjects are similar at the beginning of each period, so these studies are only useful with chronic diseases in patients with stable symptoms. Nor they serve if the outcome variable produces a permanent effect. Consider the most permanent of all, mortality. If the participant dies in the first period, it will be more than difficult to assess their response in the next period.

In addition, some of its advantages, such as small sample size, becomes in disadvantage at times. This occurs, for example, in Phase III studies in which we want to assess safety, tolerability, efficacy, detection of unpredictable adverse effects, etc. In these cases, the small sample is not only unneeded, but may be inadequate.

Finally, referring to three weaknesses from the viewpoint of design called residual effect, sequence effect and period effect.

The residual effect occurs in a period when the effect of the intervention of the previous period still persists. Let us think that we have taken a drug and there are still remains in the blood. Obviously this is solved by extending the washout period, but sometimes this is not so easy. Consider an antihypertensive treatment in which the response in the second period is more favorable for the simple fact of being included in the study (placebo effect).

The sequence effect occurs when the order of interventions affect the final result, which we could only properly assess the results of the first intervention.

Finally, it may happen that the patient characteristics change along the study, modifying its response to different interventions. We are facing a period effect.

Crossover clinical trial are, in short, more efficient in terms of sample size that parallel trials, provided that the optimal conditions for their use are met. They are very useful for studies of phase I and phase II in which we want to know the pharmacokinetics and pharmacodynamics, safety, dose titration, etc. In later phases of the development of new drugs they are less useful, especially if, as we have said, we aren’t dealing with a chronic disease with stable symptoms.

And here we end with the crossover trials. We have not talked anything about the statistical analysis of the results. In the case of parallel test results of the two branches can be directly compared, but this is not so with crossover trials, in which we must ensure that there has not been a residual, sequence or period effect. But that is another story…

## Don’t get your wires crossed

Saving is an important determinant when conducting any study, especially if it is a usually costly and time-consuming clinical trial. So, researchers have tried to design new ways of doing studies that allow them to save time and money, most in regards to the number of participants needed, one of the main determinants of the final cost of the study.

One such design is the crossover trial, which we discussed in a previous post. In this type of trial, each subject is randomized to a group, the intervention is made, there’s a washout period and the other procedure is performed as outlined in the figure you can see attached. As each subject acts as his own control, the effect of confounding variables that may exist is limited; besides, there’s less variability than in studies in which intervention and control group’s people are different. This allows having a smaller sample size than in conventional parallel clinical trials.

To make a crossover trial, the effect must be rapid and of short duration, while remaining stable throughout the study periods. Otherwise we can find two methodological weaknesses of the crossover trial: the sequence effect and the period effect.

Therefore, in addition to analyze the final effects of the two arms under study, we extend the statistical analysis of the data to be sure that wires are not crossed and we take for good a difference in effect size that, in fact, may be due to a methodological flaw of this type of trial.

This analysis is a bit laborious, so we are going to work with a completely fictitious example.

Let’s suppose we want to test two hypotensive drugs that we’re going to call A and B to not rack our brains too much. We’re going to do the example with 10 patients for simplicity’s sake, but imagine that there’re many more. In the first table we see represented the main results of the trial. We have collected systolic blood pressure (BP) before starting the study, at the end of each period and at the end of the washout period of the test. Of course, we collect also what drug has received each participant during each period.

The first thing we think of is to compare the differences in BP between the two drugs. For that we need to extract the data and rearrange them. With them we build the second table. If you take the trouble to calculate them, the mean (m) of BP after receiving A is 118.5 mmHg, with a standard deviation (s) of 16 mmHg. Corresponding values for B are m = 144.5 and s = 7.24. To determine whether these differences are significant we do a hypothesis test, setting the null hypothesis (H0) of no difference in effect. Let us assume that the variable is normally distributed, the variances are equal and that the sample was much larger in order to use the Student’s t test for paired data. If you calculate the value of t with 9 degrees of freedom, it is equal to -5.18, which corresponds with a value of p = 0.0005. As p <0.05 we reject the null hypothesis and conclude that drug A produces a greater reduction in BP that drug B.

And here the analysis would end if we were analyzing a parallel trial, but in our case we’ve to do some more checking to be sure that we don’t get our wires crossed because of the methodological weaknesses of the crossover trial.

First, we’ll check that the effect of the interventions is short and there is no residual effect of the first intervention when the second begins. If there is no residual effect, the BP at the end of the washout period should be similar to the basal BP, before any intervention. The basal BP has a m = 162.9 mmHg, with s = 14.81. For its part, the values at the end of the washout period are 156.6 and 23.14 mmHg, respectively. If we do the corresponding contrast, the value of t is 0.81, with p = 0.43. We cannot reject the H0 of equality, so we conclude that the BP is similar before the first intervention and at the end of the washout period, and then there is no residual effect.

Second, we’ll check that there is not a period effect. If this occurs, the effect at the end of the second period would be higher (or lower) than at the end of the first. At the end of the first period we find a BP’s m = 131.4 mmHg with s = 14.44 mmHg. At the end of the second values are 131.6 and 21.77 mmHg, respectively. By making the contrast we find a value of t = -0.02, p = 0.98. Conclusion: we don’t reject the H0 of equality and conclude that there is no evidence of an effect period on the trial.

Finally, we will investigate whether there might be a sequence effect. If this exists (there’s an interaction between the two interventions), the effect of each of the interventions could be different depending on the order in we performed each of them. For this we’ll calculate the median decrease in BP in all patients when using AB sequence and compare it with that found using the BA sequence. For AB sequence, data are m = -26.2 mmHg and s = 11 mmHg.  For BA sequence is -25.8 and 21.22 mmHg, respectively. The value of the Student’s t test is -0.04, which corresponds to a value of p = 0.96. Again, we cannot reject the H0 of equality and conclude that there is no a sequence effect.

And with this the analysis is done. The final conclusion is that there is a statistically significant difference in hypotensive potency of the two drugs in favor of A, and that we found no signs suggesting residual effects of an intervention over another, neither a period nor a sequence effect.

Remember that the data are fictitious and we have assumed normality and equality of variances for teaching purposes. Moreover, as we noted at the beginning, it would not be entirely correct to use the Student’s t test with such a small sample, but I took this little license to explain the example more simply. Anyway, having a computer’s program it costs the same effort to make a Student’s t test than a Wilcoxon’s.

And that’s it. You see, the statistical analysis of the results of a crossover trial is far more laborious than that of the parallel trial. Anyway, we have seen the simplest example, when there is no interaction between the two interventions. And that is so because when there is interaction analysis does not end here and further checks are necessary. But that is another story…

## The other sides of the King

We’ve already talked at other times about the king of experimental designs, the randomized clinical trial, in which a population is randomly assigned into two groups to undergo the intervention under study, one of the groups, and to serve as a control group, the other one. This is the most common side of the King, the parallel clinical trial, which is ideal for most studies about treatment, for many studies about prognosis or prevention strategies and, with its peculiarities, for studies assessing diagnostics tests. But the King is very versatile and has many other sides to accommodate to other situations.

If we think about it for a moment, the ideal design would be one that allows us to test in the same individual the effect of the intervention study and of the stablished control (placebo or standard treatment) because parallel testing is an approach that assumes that both groups respond equally to both interventions, which is always a risk of bias that we try to minimize with randomization. If we had a time machine we could test the intervention in all, note what happens, turn back in time and repeat the experiment with the control intervention. So, we could compare the two effects. The problem is, the more vigilant of you will have already guessed, that time machine has not been invented yet.

But was has been already invented is the cross-over trial design, in which each subject acts as his own control.

In this type of trial, every subject is randomized to a group, the corresponding intervention is performed, it takes place a washout period, and the other intervention is carried out. Although this solution is not as elegant as the time machine, the cross-over study defenders argue that the variability within each individual is less than the inter-individual variation. Thus, the estimate may be more accurate that the obtained with a parallel trial and we usually require smaller sample sizes. However, before using this design, a number of considerations have to be done. Logically, the effect of the first intervention should not cause irreversible changes or be very long, because it would affect the effect of the second. In addition, the washout period must be long enough to avoid leaving any residual effect of the first intervention.

We must also consider whether the order of the interventions could affect the final outcome, because in this case only results of the first intervention will be reliable (sequence effect). Another problem is that, by having a longer duration, patient characteristics may change during the study and may be different in the two periods (period effect). And finally, be alert to the losses during follow-up, more frequent in longer studies and which have greatest impact in cross-over studies trials and with more repercussion in final results than in the case of parallel trials.

Imagine now that we want to test two interventions (A and B) in the same population. Can we do it with only one trial, saving costs of any kind?. Yes, we can. We only have to design a factorial clinical trial. In this type of trial, each participant undergoes two consecutive randomizations. She’s first assigned to the intervention A or placebo (P), and then, to the intervention B or placebo, with which we’ll have four study groups: AB, AP, BP and PP. Obviously, the two interventions must act through independent mechanisms to be able to assess the results of the two effects independently.

It’s usually studied one more mature and plausible hypothesis and one that has been less tested, ensuring that the evaluation of the second doesn’t affect the inclusion and exclusion criteria of the first. Furthermore, it’s not desirable that any of the two interventions have many troublesome effects or be poorly tolerated, because the lack of compliance with one treatment will affect the compliance with the other. In cases in which the two interventions seem not to be independent, their effect could be studied separately (AP vs. PP and BP vs. PP), but we’ll lost the advantages of the design and a larger sample size will be required.

Other times it may happen that we are in a hurry to finish the study soon. Imagine a very bad disease that kills people by dozens and we’re trying a new treatment. We’ll want to have it available as soon as possible (if it works, of course), so we’ll pause the trial and discuss its results after being tested the treatment in a certain number of participants, because if we can already show the usefulness of the treatment, we’ll end the study. This is the type of design that characterizes the sequential clinical trial. Remember the in the parallel clinical trial the right thing is to pre-calculate the sample size. In this design, with a more Bayesian’s mentality, we stablish and statistic whose value determines an explicit ending rule, whereby the sample size depends on the previous observation of the study. When this statistic reaches the preset value we are confident enough to reject the null hypothesis and end the study. The problem is that each stop and analysis increases the error of reject the null hypothesis being true (type 1 error), so it’s not recommended to perform many interim analysis. Moreover, the final analysis of results is more complex because we have to take into account the interim analysis. This type of trials is very helpful with very quick impact interventions, which is often seen in studies about dose titration of opioids, hypnotics, and poisons of that kind.

There are other occasions where individual randomization makes no sense. Think we have taught physicians of a health center a new technique to better inform their patients and want to compare it with the old one. We cannot say the same physician to inform some patients in a way and other patients in another, since there would be a strong possibility that the two interventions contaminate to each other. It would be more logical to teach a group of medical centers and not teach another group and compare the results. Here we randomize health centers to form or not their doctor. This is the cluster allocations design. The problem with this design is that we have little assurance that participants of different groups behave independently, so the sample size required can be greatly increased if there is great variability among groups and little within each group. In addition, we must perform and aggregate analysis of results, because if we do it individually confidence interval will be falsely narrowed and we can find false statistical significance. The usual practice is to calculate a weighted statistic for each group and make final comparisons with it.

The last of the series we are going to deal with is the community trial, in which the intervention is applied to populations. As it’s performed on populations under actual conditions it has high external validity and it often allow us recommending cost-effective measures based on their results. The problem is that it is often difficult to establish a control group, it may be more difficult to determine the sample size needed and is more complex to perform causal inference from their results. It is the typical design for evaluating public health measures such as water fluoridation design, vaccinations, etc.

As you can see, the King has many sides. But it also has lower-rank relatives, but which are not less worthy. It’s so because it has a whole family of quasi-experimental studies consisting of trials that are not randomized or controlled, or any of both things. But that’s another story…