Permuted blocks and stratified randomization.

How the limitations of simple randomization can be overcome through the use of permuted blocks, stratification, and minimization is analyzed. The way in which the balance of key prognostic factors is ensured and the statistical power of clinical trials is increased by these techniques, preventing the validity of results from being compromised by chance, is described.
Have you ever tried to organize a wedding seating chart without triggering World War III? It is a delicate, Machiavellian art and, honestly, one of the toughest stress tests known to the so-called human being.
If you find yourself in that position, don’t even think about leaving it to chance, because the infallible Murphy’s Law guarantees that your uncle, the one who tells 1980s political jokes and believes climate change is an invention of umbrella manufacturers, will end up sitting elbow-to-elbow with your boss, a stoic vegan activist with very little patience for nonsense.
If you opted for “total freedom” and simply flipped a coin to decide where each guest goes, chaos would be served. You would have unbalanced tables, with deadly boredom on one flank of the room (the “dead zone” where all the shy people ended up) and civil unrest on the other (where chance has concentrated the defenders of pineapple on pizza across from a commando of orthodox Neapolitans on the verge of a heart attack). Pure chance is very democratic, yes, but it has a terrible sense of rhythm and no notion of diplomacy.
To avoid these social disasters, we can resort to strategies that would make German engineering blush: we seat people in predefined “blocks” to ensure variety, or we separate them by strata (the ones from the gym, the ones from work, the distant cousins you only see at funerals) so that the conversation flows and no one ends up throwing centerpieces at each other.
Interestingly, in medical research, something very similar happens. Researchers realized long ago that tossing a coin to assign treatments to patients could generate groups as unbalanced as that wedding table where, by pure bad luck, you sat all your exes together.
So, in this post, we are going to try to discern how we can prevent chance from playing a trick on us in clinical trials. We are going to dive into the fascinating world of permuted blocks and stratified randomization. We will see how these techniques ensure that study groups are comparable and balanced, avoiding bias and ensuring that, at the end of the dinner… sorry, of the study, the results are as solid as they are digestible.
The tyranny of the coin toss
Let’s do things methodically and start at the beginning: simple randomization.
As the name suggests, this is the simplest method of randomization. It is the methodological equivalent of tossing a coin to decide which of the two groups in a clinical trial to assign the participant who has just arrived.
On paper, this should work. If we have two thousand wedding guests and seat them following pure chance, statistics can achieve (though without a guarantee) a uniform distribution of guests with different characteristics. There will be as many “annoying in-laws” on the right as on the left. There will be as many “party friends” here as there. The law of large numbers is that generous when the sample size is enormous.
But we have already said that this is not guaranteed, and sometimes, chance can play tricks on us. After all, most clinical trials do not have such a high number of participants, but rather a reduced one.
This is where the coin becomes capricious. We already know that, if the coin is not rigged, each toss has a 50% probability of heads and a 50% probability of tails. But this does not mean that if we toss the coin, for example, 10 times, we are mandatory to obtain 5 results of each of the two possibilities.
It could be the case that we get, for instance, 10 heads. It is improbable, but not impossible. If you take the trouble to calculate it, the probability is slightly less than 0.1%. Very low, but it will happen once every 1024 times we perform the experiment.
The conclusion is that simple randomization does not guarantee the equal distribution of participant characteristics between the two groups, especially if the sample is not very large. And this, as we know, is essential to be able to claim that the differences we observe between the two groups are due to the intervention under study.
Imagine we conduct a clinical trial to test a new hypertension drug, we use simple randomization, and we have the bad luck that the groups turn out to be unbalanced in some important characteristic. For example, if the intervention group fills up with 25-year-old vegan marathoners and the control group fills up with sedentary gentlemen who love fried pork rinds, what do you think would happen? The drug would seem wonderful in the intervention group, but not because it works, but because those kids have arteries cleaner than a whistle, while there is no pill that can compensate for three decades of saturated fats.
And this is an obvious and somewhat absurd example, but consider that in real life, important characteristics, both known and unknown to the investigator, can remain unbalanced. But don’t worry. We can modify randomization to try to prevent this from happening to us. Let’s see how.
The clinical tetris: permuted blocks
The first possibility we are going to look at is performing randomization using permuted blocks. We are going to forget about the lone coin and play Tetris with the study participants instead.
Instead of assigning patients one by one with total randomness, we group them into small “packages” or blocks. Let’s say we decide to use blocks of 4 elements. The golden rule is: within each block of 4, there must mandatorily be 2 participants for one group and 2 for the other. The order within the block will be random, yes, but the partial total will always be balanced.
Let’s see how this works in practice. Imagine we have two groups (A and B) and we decide to use blocks of four elements. The possible combinations of A and B in a group of four (where there are always two As and two Bs) would be: AABB, BBAA, ABAB, BABA, ABBA, and BAAB.
The investigator randomly chooses one of these blocks and fills the slots of the chosen block with the next four participants who walk through the door. The great advantage? At no point will there be a large imbalance. If the study or recruitment is cancelled halfway for any reason, there will always be, at most, a difference of 1 or 2 participants between the groups. It will never happen that you have 15 in one group and 5 in the other.

However, this method has a small drawback: predictability. If the investigator knows we are using blocks of 4, they can start counting cards like they are in a Las Vegas casino and know, for example, which group the last participant of each block will belong to.
Imagine this fourth patient of the block arrives. If the investigator knows that the first went to group A, the second to A, and the third to B, and also knows that the rule is “two As and two Bs per block,” it won’t take much effort to conclude that the fourth patient must go to group B no matter what. And just like that, you’ve lost that random feelin’… and it’s gone, gone, gone!
And this might seem like a minor evil to you but think for a moment what would happen if the investigator believed that treatment B is better (or worse). They could decide to “wait a bit” before enrolling that patient they like so that they get (or don’t get) treatment B. Information bias rears its ugly head.
That is why, to prevent clever investigators from guessing the sequence, methodologists (who are very paranoid people) usually use larger blocks or vary the block size randomly. Now a block of 4, now one of 6, now one of 4 again. This keeps the investigators confused and honest, which is exactly how they should be.
The logistic nightmare: stratification
Permuted blocks are all well and good for ensuring we have the same number of participants in each group. But what if we are worried not just about quantity, but about quality?
Let’s head back to our wedding banquet for a moment. We’ve managed to get 50 people on the groom’s side and 50 on the bride’s. A numerical success! But then we look closer and realize that, by pure chance (or a stroke of bad luck), we have put all the toddlers under 5 on the groom’s side and all the elderly with hearing aids on the bride’s side.
The result will be a food war and crying on one side, while on the other, not even the toast will be heard. The groups are balanced in number, but not in key characteristics like age, decibel levels, or bladder capacity.
In a clinical trial, this can be critical. Imagine we are testing an anti-wrinkle cream. It doesn’t matter if we have 50 patients in each group if it turns out the placebo group is full of 20-year-old girls and the treatment group is full of 80-year-old ladies who have been sunbathing without protection since 1970. Obviously, the placebo group will have better skin at the end, and we will erroneously conclude that our cream makes people age.
To avoid these situations, we have another slightly more sophisticated resource: stratified randomization.
In these cases, before randomizing, the researcher classifies each participant according to a characteristic or stratum considered important.
Imagine you want to conduct a trial for a new drug to treat that fearsome disease known as fildulastrosis. We know there is a factor that can act as a confounder and drastically influence the evolution of the disease: the consumption of reality TV shows. Naturally, we want this factor to be balanced between the two branches of the study, intervention and control.
What we do is define two strata based on television consumption habits, assign participants to their corresponding stratum, and thus establish two separate “access paths” to the study where luck is cast independently.

Of course, stratification can be combined with permuted blocks. For example, if a couch potato arrives, we assign them to their specific stratum and, once inside, we randomize them either simply (flipping a coin) or within the permuted blocks we have defined, as explained above.
Stratification is like a statistical nirvana that increases the statistical power of a clinical trial. By ensuring the balance of key prognostic characteristics between treatment groups, the standard error of the effect estimate is reduced, thereby facilitating an impartial and robust comparison.
But beware: stratifying is addictive, so its implementation requires moderation. It is important to limit the number of stratification variables, especially in studies with a small sample size. An excess of factors increases the total number of strata exponentially, since it is the product of the number of categories for each variable, which results in excessively complex and logistically unfeasible randomization procedures.
Furthermore, if there are too many strata, we might create some with an insufficient sample size to perform a balanced assignment. To guarantee the viability of the study, stratified randomization requires the exclusive identification of those key prognostic characteristics that are measurable at the time of randomization and are considered strongly associated with the primary outcome.
The on-the-fly fix: minimization
There is one more technique for the true gourmets of methodology that deserves an honourable mention: minimization. If blocks are engineering and stratification is obsessive classification, minimization is pure computerized jazz improvisation.
Imagine our clinical trial gets complicated. We no longer care about just one factor, like whether participants are male or female. Now we have three different participating hospitals, two different stages of the disease, and to top it off, we want to balance patients based on whether they have received prior treatment or not.
If we tried to create strata for all possible combinations of these variables, we would end up with more groups than there are inhabitants in Podunk. We would have a bunch of “little empty boxes” waiting for a patient who might never arrive (the mythical “male from the South Hospital, stage III, who has never been medicated”).
This is where minimization enters the room with sunglasses and a “problem-solving” attitude. Unlike previous methods, which have lists ready before starting, minimization is an algorithm that “looks” at how the party is going in real-time and makes decisions on the fly. It considers the current balance of the key prognostic characteristics between treatment groups and, if an imbalance exists, assigns future patients as necessary to rebalance the groups.
A practical example will help us visualize this controlled chaos. Let’s say we are testing a new drug and have recruited 40 patients. At that moment, Billy Bob, patient number 41, walks through the door. Billy Bob is from the South Hospital, his disease is in stage II, and he has been treated before.
The computer performing the clinical trial randomization, instead of flipping a coin, does a quick review of the two treatment groups (A and B) as they are right now. It goes something like this: “If I put Billy Bob in Group A, things get ugly. In A, I already have an excess of people from the South Hospital and too many patients who have already received previous treatment. I could unbalance the scales even further. But I see that in B, we are short on people from the South and previously treated patients. Billy Bob fits there like the last piece of Tetris.”
The system calculates an “imbalance score” and realizes that assigning Billy Bob to Group B reduces the total differences between the groups. So, sold! Billy Bob goes to Group B to maintain the harmony of the study. And when patient 42 arrives, we’ll see what they’re named and which group it makes more sense to put them in.
Basically, minimization is not pure chance, but a small margin of randomness, like 80/20, is usually left so that it is not totally predictable. However, it is the most powerful tool to ensure that, at the end of the day, we are comparing groups that are truly similar in the characteristics that can influence the study results. While it is more complex than stratification, it is highly effective and can accommodate more factors than stratification can.
We’re leaving…
Having understood that, in the end, science, much like a good wedding, is largely about managing chaos, we’re going to call it a day.
We have seen how permuted blocks can prevent streaks of bad luck from leaving us with lopsided groups in small studies. They are our Lego blocks for building a solid foundation, provided we take care to ensure the color of the next block isn’t obvious. By ensuring that the desired allocation proportions are achieved exactly within each block, this technique helps maintain numerical balance even during interim analyses.
We have also shown how stratification acts as our Instagram filter, trying to make both groups look equally good (or bad) in the photos. It protects us from confounding variables, those annoying guests trying to steal the spotlight from the intervention whose effect we are trying to evaluate. By essentially producing a randomized trial within each stratum, stratification ensures that the treatment groups are balanced for the most important measurable prognostic characteristics.
So, as you can see, randomization is much more than just flipping a coin. It doesn’t always fulfill its mission of balancing known and unknown confounders between the two groups, since even the most improbable distributions can show up in our studies. In fact, I’d say we should be suspicious if simple randomization produces “perfectly identical” groups; in those cases, it’s worth trying to calculate the actual probability of chance making that specific distribution. But that’s another story…
