When describing and dealing with random variables, we must consider whether they are independent data or paired data.
We doctors are measuring things all day. It’s what we can best do. If we were as capable to heal our patients as to measure things in them, half of us couldn’t find a job due to the lack of patients. There’s only one thing we like more that measuring: to forbid. We love prohibit people whatever they like, with any diverse excuses. And now that end of the year’s festivities are coming it comes our happy hour to forbid willy-nilly: alcohol, chocolates, parties and this and that.
But, until the time comes, we will have to settle for measuring. We usually measure variables, which are data that, as its name suggests, vary from person to person. Once we have measure some variables in a group of people we have to start working with the data in order to draw our conclusions.
First thing we’ll do is to describe data with measures of central tendency and dispersion. Then, we’ll apply different statistical tests to compare variable with one another. And it is then when the concept of independence of variables plays a key role, since statistical procedures can vary greatly depending on working with dependent or independent variables and, if we do not keep it in mind, we can make serious errors when doing any kind of statistical inference.
State it very simply, two variables are independent when knowing one gives no information about the value the other can have. On the other hand, they are dependent when the value of one can give us a clue of how the other should be.
Let’s imagine two dependent variables: weight and body mass index. If we know that an individual weight 18 kilos, we can imagine that his body mass index will be very tiny (except, of course, if he is one of the dwarfs of the fairy tale). Conversely, if one has an index of 60, you can look for someone to repair the scale after having weighted him.
This example is very clear, but it is not always so easy to discern if two values are dependent or independent. Let’s suppose we measure the height of students in a school. The height of any given guy won’t tell us anything about the height of any given girl from the school, unless they are brothers or something similar. We can compare the heights by sex considering them as independent values.
Let’s think now we do a longitudinal study of growth with the same students. The height value of any given student will indicate, more or less, how it will be its successive values, so we cannot consider as independent values any pair of values of each individual student. They are paired data.
Finally, let’s think about a more complex example. Suppose we measure the height of a group of mothers and their children. At first glance, it might be considered that average heights in mothers and children are independent but, what if the shorter mothers have more children than the tallest?. Probably, children`s average height would be different from that we would obtain if all the mothers had the same number of children. We have to consider them as paired data.
Another not so obvious example of dependence is that of the clustered studies. Imagine we want to study a diagnostic test and apply it in some hospitals and not in the others to prevent contamination within the same center. We should consider this relationship between center and technique used when drawing conclusions about the results. This is another example of paired data.
To end this post, I just want to warn you that you must not mix up the concept of independence we have explained with the concepts of dependent and independent variable in the regression models. In these cases the term dependent is related to the outcome variable, while independent regards the explanatory variable. But that’s another story…