Degrees of freedom
Freedom is one of those concepts that everyone can understand easily, but it is extremely difficult to define. If you don’t believe me, try to state a definition of freedom and you will see that it is not so easy. Right away, you’ll be running into other people’s freedom when trying to define yours or you’ll be wondering what kind of freedom you’re trying to define.
Degrees of freedom
However, with degrees of freedom it goes exactly the opposite. This term is far easy to define, but many have trouble understanding the exact meaning of this seemingly abstract concept.
The degrees of freedom are the number of observations in a sample which can take any possible value (which are “free” to take any value) given that it has been previously and independently calculated certain parameter estimate in the sample or its population of origin. Do you realize now why do I say it is easy to define but not so easy to understand?. Let’s see an example just to be a little clear.
In a stroke of delusional imagination, let’s assume that we are school teachers. The school principal tells us there’s a competition among neighboring schools and we have to select five students to represent ours. The only condition we have to fulfill is that the final average rating of the five students must be seven points. Let’s also suppose that, as it happens, our eldest son, who records eight, is in the class.
So, acting impartially, we pick him out to represent his peers. But we still need to pick four more so, why not be consistent with our sense of justice and choose his four friends. His friend Philip has 9, John 6, Louis 5 (he gets through by the narrowest of margins) and Evaristo records 10 (the very nerd). What’s the problem?. It’s that the five’s average record is 7.6 and it should be exactly 7. What can we do?.
Let’s try to remove Louis; after all he’s the one with lower grades. The problem is that we’ll have to choose a student with a score of 2 to come up with an average of 7. But we can’t select a student who has failed his tests. Then, let’s try to remove nerd-Evaristo and we’ll need to look for a student with a score of 7. If you think about it, we can make all possible combinations with the five friends, but always choosing only four, since the fifth would be bound by the average value we have set previously. So this means, nor more and no less, that we have four degrees of freedom.
When we make a statistical inference in a population, if we want the results to be reliable, we have to do each estimate independently. For instance, if we calculate the mean and the standard deviation we should do it independently, but this is not usually so, since we need and estimate of the mean to calculate the standard deviation. This is why not all the estimates can be considered free and independent of the mean. At least one of them will be conditioned by the value previously settled for the mean.
So you can see that the number of degrees of freedom indicates us the number of independent observations that are involved in the estimation of a population parameter.
This is important because estimators follow specific frequency distributions whose shape depends on the number of degrees of freedom associated with the estimate. The greater the number of degrees of freedom, the narrower the frequency distribution and the higher the power of the study to make the estimation. Thus, power and degrees of freedom are positively related with the sample size, so that the larger the sample size the greater the number of degrees of freedom and hence the greater the power.
To calculate the number of degrees of freedom of a test is usually straightforward, but it is different depending on the test in question. To calculate the mean of a sample is the simplest case. We have already seen that it equals n-1, being n the sample size. Similarly, when we are dealing with two samples and two means, the number of degrees of freedom equals n1+n2-2. In general, when calculating several parameters, the degrees of freedom are calculated as n-p-1, being p the number of parameters to be estimated. This is useful when we do an analysis of variance to compare two or more means.
And so we could give examples for the calculation of each particular statistical or test we’d want to accomplish. But that’s another story…