Mean or median?
Mean and median are two measures of centralization. It is discussed in which situations it is preferable to use one or the other to describe data.
This is the difficult dilemma that presents to me every time I go to have lunch to a good restaurant. I, honestly, like better meat, but as science textbooks say I’m an omnivorous animal, and I don’t want to contradict them, I try to eat all kind of food, including fish.
Both meat and fish have its pros and cons. Meat is easier to eat. On the other hand, I find it difficult to have a good fish if it’s not at a good restaurant, so I find hard to miss the opportunity. But meat tastes so good. A hard decision…
It is much easier to decide between mean and median, no doubt about it.
Mean or median?
As you all know, the mean (we’re talking about the arithmetic mean) and the median are measures of central tendency or centralization. They provide information on what is the central value of a distribution.
The simplest way to calculate the arithmetic mean is adding all the values of the distribution and dividing the resulting value by the number of elements of the distribution, our beloved n.
To get the median we have to sort the elements of the distribution from lowest to highest and locate the one that is the central element. If there’re an odd number of elements, the median value is the central one.
For instance, if we have a distribution of 11 elements sorted from lowest to highest, the value of the element in the sixth position will be the median of the distribution. If the number is even, the median will be the average of the two central values. For example, if we have 10 elements, the median will be the mean of the fifth and sixth ones. There’re formulas and other ways to get the median but, as always, the best way is to use a computer program that will do it effortlessly.
It’s usually easier to decide between mean and median than between meat and fish, as there’re some general rules that can be applied to each case.
When median is better choice
First, when the data doesn’t fit a normal distribution it’s more appropriate to use the median. This is because the median is much more robust, which means that is less affected by the presence of bias or outliers in the distribution.
The second rule has to do with the above. When there’re very extreme values in the distribution the median will inform better about the central point of the distribution than the mean, which has the drawback that it tends to deviate towards the outliers, the largest the deviation the more extreme the outliers.
Finally, some people think that using the median makes more sense when talking about some kind of variables. For example, if we are talking about survival, the median inform us about the survival time, but also about how much survive half of the sample. It will be more informative than the arithmetic mean.
Anywhere, whatever you choose, the two measures are still useful. And to understand this we will see a couple of examples that are as good as that I’ve just invented them.
Some stupid examples
Let’s suppose a school with five teachers. Science’s teacher is paid 1200 euros per month, Math’s 1500, Literature’s 800 and History’s 1100. It turns out that the principal is a football fan, so he hires Zinedine Zidane as a gym teacher. The problem is that Zuzu doesn’t work for 1000 euros a month, so he assigns him a salary of 20000 euros per month.
In this case the mean salary is 4920 euros per month and the median 1200 euros. What do you think is the best central tendency measure in this case?. It seems clear that the median give a better idea of what teachers typically earn at this school. The mean raises a lot because it goes behind the extreme value of 20000 euros per month.
Many of you might even be thinking that the mean is of little use in this case. But it’s because you’re looking it from the applicant to teacher’s point of view. If you were applying to be the manager of the school and you have to prepare the monthly budget, what of the two measurers will be more helpful?. No doubt the mean will say you how much money you have to provide to pay teachers, knowing the number of teachers in the school, of course.
Here’s another example. Suppose I take 20 fat people and allocate them in two groups to test two diets. Making a stretch of the imagination we’ll call them diet A and diet B.
Three months later, those on diet A have lost 3.4 kg on average, whereas those on diet B have lost a mean of 2.7 kg. Which of the two diets is more effective?
For those smart people who have said that diet A, I will provide you with a little more information. These are the differences between final and initial weight of patients on diet A: 2, 4, 0, 0, -1, -1, -2, -2, -3 and -35. And these are the same values for subjects on diet B: -1, -1, -2, -2, -3, -3, -3, -3, -4 and -5. Still thinking that diet A is more effective?
I’m sure that the more vigilant of you have already detected the trap of this example. In group A there’s an outlier who lost 35 kg with that diet, what makes the mean deviating toward those -35. So let’s calculate the median: -0.5 kg for A and -3 kg for diet B.
It seems that diet B is more effective and that the median gives, in this case, better information on the central tendency of the distribution. Be aware that in this example it is easy to see looking at raw data, but if you had 1000 instead of 10 participants it wouldn’t be so easy. You would have to detect the presence of outliers and to use a more robust tendency measure than the mean, such us the median.
Surely someone would eliminate the outlier and use the mean with the rest of the data, but this is not advisable, because outliers can also provide information on certain aspects. For example, who tells us that there’s no a special metabolic situation in which diet A is much more effective than diet B, which is the most effective in the other cases?.
And let’s leave it here for today. Just say that sometimes we can use the transformation of data to get a normal distribution or to compensate for the effect of outliers. Also, there’re other robust measures of central tendency different to the median, such us the geometric mean or the trimmed mean. But that’s another story…