Conventional randomized clinical trial is an individualistic design, in which each participant is randomized to receive the intervention or placebo to measure the outcome variable after each and compare the differences. This single randomization is complemented by the masking process, so that no one knows which group each participant belongs to and it cannot be effects related to this knowledge.

The problem is that there are times when it is not possible to mask the intervention, so that participants know what everyone receives. Suppose we want to study the effect of certain dietary advice in blood pressure levels in a population. We can give or not the recommendations to each participant, but each of them will know if we gave it to them or not, so masking is not possible.

In addition, two facts that can invalidate the comparison of effects with or without the intervention may occur. First, participants can share information between them, with what some in the placebo group would also know the advices and could follow them. Second, it could be difficult for the researchers to treat objectively the participants from both groups, and their recommendations could be directed to the wrong participant in some situations. This is what is known as contamination between groups, very often when we try to study interventions in public health or health promotion programs.

But do not worry ahead of time, because to solve this problem we can fall back on the gregarious cousin of the randomized clinical trial’s family: the cluster randomized trial.

In these trials the unit of randomization is not the individual but groups of individuals. Thinking in the previous example, we could randomize patients from a health center to the intervention group and patients from another center to the control group. This has the advantage that prevents contamination between groups, with the added advantage that participants within each group behave similarly.

For this design to work properly there have to be a sufficient number of groups to allow that the basal characteristics of the components will be balanced by randomization. It’s also mandatory to keep in mind a number of special considerations during the phases of design, analysis and communication of results of cluster trials, since the lack of independence of the participants in each group has major statistical considerations. It may occur that the members of each group have some common characteristics different from those of other groups (selection bias) and also it may be a different distribution of confounding variables within each group.

One problem with this type of design is that it has less power than the equivalent randomized clinical trial, so larger sample sizes are needed in relation to what is call the cluster inflation factor. Furthermore, the number and size of each group and the correlation that may exist between the results of patients within the same group, using intracluster correlation coefficient, must be considered.

Thus, to calculate the sample size we have to multiply the size that would have the standard trial by a factor of study design, which has into account the cluster size and the intracluster correlation coefficient. The formula to calculate it is the following:

N (cluster trial) = Inflation factor x N (standard clinical trial)

Inflation factor = 1 + [(m – 1) x ICC], where m is the cluster size and ICC is the intracluster correlation coefficient.

Here’s an example. Suppose we have been considering a trial and we would need 400 participants for the standard trial to detect certain effect size with the power and desired statistical significance. We estimate the intracluster correlation coefficient is equal to 0.15 and determined that we want to clusters with 30 participants. The sample size required for a cluster randomized trial is

N (cluster trial) = (1 + [(30 – 1) x 0.15]) x 400 = 2140

Rounding off, we need 72 clusters of 30 participants, with a total sample of 2160. As can be seen, about five times the size of conventional trial’s sample.

Another peculiarity of cluster trials is that the analysis phase must take into account the lack of independence among the patients in each group, no matter whether we calculate results individually or we get summary measures at cluster level. This is because if we ignore the lack of independence among participants it will increase the probability of making a type I error and draw the wrong conclusion. To understand this, a p-value of 0.01 can become something more than 0.05 once we consider this effect.

This causes that we cannot use tests like Student’s t test and we have to resort to robust analysis of variance or to the more employed random effects model, which not only takes into account the cluster effect, but enables an estimate and assess the degree of contamination. It also takes into account the heterogeneity by unobserved factors and allows adjusting for covariates that produce imbalances between different groups. One possibility is to make the analysis considering and not considering the effect of clustering and check if the values of significance are different, in which case it supports the fact that we have chosen the right kind of design for our study.

And these are the most important issues that we have keep in mind when conducting a cluster trial. Its main advantage is to avoid contamination between participants, as we saw at the beginning, so they are very useful for assessing strategies to improve health and for educational programs. Its main drawback has been already mentioned: the lower power with the consequent need for much larger sample sizes.

Finally, just to say that all these issues concerning the calculation of sample size and statistical analysis taking into account the effect of clusters should be clearly specified in the communication phase of the test results.

One last advice. If you carry out a cluster trial or the critical reading of a clinical cluster trial, you do not forget to check that the authors have taken into account the peculiarities that we have discussed. To do this you can use the CONSORT statement. This is a checklist of characteristics that must meet the clinical trials, including the specific characteristics of cluster trials. But that is another story…