I was thinking about the effect size based on mean differences and how to know when that effect is really large and, because of the association of ideas, someone great has come to mind who, sadly, has left us recently. I am referring to Kirk Douglas, that hell of an actor that I will always remember for his roles as a Viking, as Van Gogh or as Spartacus, in the famous scene of the film in which all slaves, in the style of our Spanish’s Fuenteovejuna, stand up and proclaim together to be Spartacus so that Romans cannot do anything to the true one (or to get all equally whacked, much more typical of the *modus operandi* of the Romans of that time).

You won’t tell me the man wasn’t great. But how great if we compare it with others? How can we measure it? It is clear that not because of the number of Oscars, since that would only serve to measure the prolonged shortsightedness of the so-called academics of the cinema, which took a long time until they awarded him the honorary prize for his entire career. It is not easy to find a parameter that defines the greatness of a character like Issur Danielovitch Demsky, which was the ragman’s son’s name before becoming a legend.

We have it easier to quantify the effect size in our studies, although the truth is that researchers are usually more interested in telling us the statistical significance than in the size of the effect. It is so unusual to calculate it that even many statistical packages forget to have routines to obtain it. In this post, we are going to focus on how to measure the effect size based on differences between means.

Imagine that we want to conduct a trial to compare the effect of a new treatment against placebo and that we are going to measure the result with a quantitative variable X. What we will do is calculate the mean effect between participants in the experimental or intervention group and compare it with the mean of the participants in the control group. Thus, the effect size of the intervention with respect to the placebo will be represented by the magnitude of the difference between the mean in the experimental group and that of the control group:However, although it is the easiest to calculate, this value does not help us to get an idea of the effect size, since its magnitude will depend on several factors, such as the unit of measure of the variable. Let us think about how the differences change if one mean is twice the other as their values are 1 and 2 or 0.001 and 0.002. In order for this difference to be useful, it is necessary to standardize it, so a man named Gene Glass thought he could do it by dividing it by the standard deviation of the control group. He obtained the well-known Glass’ delta, which is calculated according to the following formula:Now, since what we want is to estimate the value of delta in the population, we will have to calculate the standard deviation using n-1 in the denominator instead of n, since we know that this quasi-variance is a better estimator of the population value of the deviation:But do not let yourselves be impressed by delta, it is not more than a Z score (those obtained by subtracting to the value its mean and dividing it by the standard deviation): each unit of the delta value is equivalent to one standard deviation, so it represents the standardized difference in the effect that occurs between the two groups due to the effect of the intervention. This value allows us to estimate the percentage of superiority of the effect by calculating the area under the curve of the standard normal distribution N(0,1) for a specific delta value (equivalent to the standard deviation). For example, we can calculate the area that corresponds to a delta value = 1.3. Nothing is simpler than using a table of values of the standard normal distribution or, even better, the pnorm() function of R, which returns the value 0.90. This means that the effect in the intervention group exceeds the effect in the control group by 90%.

The problem with Glass’ delta is that the difference in means depends on the variability between the two groups, which makes it sensitive to these variance differences. If the variances of the two groups are very different, the delta value may be biased. That is why one Larry Vernon Hedges wanted to contribute with his own letter to this particular alphabet and decided to do the calculation of Glass in a similar way, but using a unified variance that does not assume their equality, according to the following formula:If we substitute the variance of the control group of the Glass’ delta formula with this unified variance we will obtain the so-called Hedges’ g. The advantage of using this unified standard deviation is that it takes into account the variances and sizes of the two groups, so g has less risk of bias than delta when we cannot assume equal variances between the two groups.

However, both delta and g have a positive bias, which means that they tend to overestimate the effect size. To avoid this, Hedges modified the calculation of his parameter in order to obtain an adjusted g, according to the following formula:where df are the degrees of freedom, which are calculated as n_{e} + n_{c}.

This correction is more needed with small samples (few degrees of freedom). It is logical, if we look at the formula, the more degrees of freedom, the less necessary it will be to correct the bias.

So far, we have tried to solve the problem of calculating an estimator of the effect size that is not biased by the lack of equal variances. The point is that, in the rigid and controlled world of clinical trials, it is usual that we can assume the equality of variances between the groups of the two branches of the study. We might think, then, that if this is true, it would not be necessary to resort to the trick of n-1.

Well, Jacob Cohen thought the same, so he devised his own parameter, Cohen’s d. This Cohen’s d is similar to Hedges’ g, but still more sensitive to inequality of variances, so we will only use it when we can assume the equality of variances between the two groups. Its calculation is identical to that of the Hedges’ g, but using n instead of n-1 to obtain the unified variance.

As a rough-and-ready rule, we can say that the effect size is small for d = 0.2, medium for d = 0.5, large for d = 0.8 and very large for d = 1.20. In addition, we can establish a relationship between d and the Pearson’s correlation coefficient (r), which is also a widely used measure to estimate the effect size.

The correlation coefficient measures the relationship between an independent binary variable (intervention or control) and a numerical dependent variable (our X). The great advantage of this measure is that it is easier to interpret than the parameters we have seen so far, which all function as standardized Z scores. We already know that r can range from -1 to 1 and the meaning of these values.

Thus, if you want to calculate r given d, you only have to apply the following formula:where p and q are the proportions of subjects in the experimental and control groups (p = n_{e} / n and q = n_{c} / n). In general, the larger the effect size, the greater r and vice versa (although it must be taken into account that r is also smaller as the difference between p and q increases). However, the factor that most determines the value of r is the value of d.

And with this we will end for today. Do not believe that we have discussed all the measures of this family. There are about a hundred parameters to estimate the effect size, such as the determination coefficient, eta-square, chi-square, etc., even others that Cohen himself invented (not very happy with only d), such as f-square or Cohen’s q. But that is another story…