Science without sense…double nonsense

Píldoras sobre medicina basada en pruebas

Posts tagged Ecological fallacy

King Kong versus Godzilla

What a mess these two elements make when they are left loose and come together! In this story, almost as old as me (please, do not run to look at what year the movie was made) poor King Kong, who must have traveled more than Tarzan, leaves his Skull Island to defend a village from an evil giant octopus and drinks a potion that leaves him sound asleep. Then, some Japanese gentlemen seized the opportunity to take him to their country. I, who have visited Japan, can imagine the effect it produced on the poor monkey when he woke up, so it had no choice but to escape, with the misfortune of meeting Godzilla, who had also escaped from an iceberg where it had been previously frozen. And there they are bundled and the fight begins, stones over here, atomic rays over there, until the thing gets out of control and finally King Kong is going to attack Tokyo, I do not remember exactly for what reason. I swear I have not taken any hallucinogenic, the film is like that and I will not reveal more for not spoiling the end in the incredible case that you want to see the film after what I have told you. What I do not know is what the screenwriters would have taken before planning this story.

At this point you will be thinking about how today’s post may be related to this story. Well, the truth is that it has nothing to do with what we are going to talk about, but I could not think of a better way to start. Well, it may actually be related, because today we are going to talk about a family of monsters within epidemiological studies: the ecological studies. It’s funny that when you read something about ecological studies, it always starts by saying that they are simple. Well, I do not think so. The truth is that they have a lot to get our teeth into and we are going to try to explain them in a simple way. I thank my friend Eduardo (to whom I dedicate this post) for the effort he made to describe them intelligibly. Thanks to him I could understand them. Well… a little bit.

Ecological studies are observational studies that have the peculiarity that the study population are not individual subjects, but grouped subjects (in conglomerates), so the level of inference of their estimates is also aggregated. They tend to be cheap and quick to perform (I suppose that hence its supposed simplicity), since they usually use data from secondary sources already available, and are very useful when it is not possible to measure the exposure at the individual level or when the measurement of the effect can only be measured at the population level (such as the results of a vaccination campaign, for example).

The problem comes when we want to make inferences at the individual level based on their results, since they are subject to a series of biases that we will comment later on. In addition, since they use to be descriptive studies of historical temporality, it can be difficult to determine the temporal gradation between the exposure and the effect studied.

We will look at the specific characteristics in relation to three aspects of its methodology: types of variables and analyzes, types of studies and biases.

Ecological variables are classified in aggregate and environmental variables (also called global variables). The aggregate ones show a summary of individual observations. They are usually averages or proportions, such as the mean age at which the first King Kong’s movie is seen or the rate of geeks for every 1000 moviegoers, to name two absurd examples.

On the other hand, environmental measures are characteristic of a specific place. These can have a parallelism at an individual level (for example, the levels of environmental pollution, related to the crap that each swallows) or be attributes of groups without equivalence at the individual level (such as water quality, to say the least).

As for the analysis, it can be done at the aggregate level, using data from groups of participants, or at the individual level, but better without mixing the two types. Moreover, if data of both types is collected, it will be more convenient to transform them into a single level, the simplest being to aggregate the individual data, although it can also be done the other way around and, even, make an analysis in the two levels with techniques of hierarchical multilevel statistics, only afforded by a few privileged minds.

Obviously, the level of inference we want to apply will depend on what our objective is. If we want to study the effects of a risk factor at the individual level, the inference will be individual. An example would be to study the relationship between the number of hours television is watched and the incidence of brain cancer. On the other hand, and following a very pediatric example, if we want to know the effectiveness of a vaccine, the inferences will be made in an aggregated form from the data of vaccination coverage in the population. And to finish curling the curl, we can measure an exposure factor of the two forms, individual and grouped. For example, density of Mexican restaurants in a population and frequency of antacids intake. In this case we would make a contextual inference.

Regarding the type of ecological studies, we can classify them according to the exposure method and the grouping method.

According to the exposure method, the thing is relatively simple and we can find two types of studies. If we do not measure the exposure variable, or we do it partially, we talk about exploratory studies. In the opposite case, we will find ourselves before an analytical study.

According to the grouping method, we can consider three types: multiple (when multiple zones are selected), temporary (there is measurement over time) and mixed (combination of both).

The complexity begins when the two dimensions (exposure and grouping) are combined, since then we can find ourselves before a series of more complex designs. Thus, multiple group studies can be exploratory (the exposure factor is not measured, but the effect is measured) or analytical studies (the most frequent, we measure both here). The studies of temporal tendency, to not be less, can also be exploratory and analytical, in a similar way to the previous ones, but with a temporal trend. Finally, there will be mixed studies that compare the temporal trends of several geographical areas. Simple, isn’t it?

Well, this is nothing compared to the complexity of the statistical techniques used in these studies. Until recently the analyzes were very simple and based on measures of association or linear correlation, but in recent times we have seen the development of numerous techniques based on regression models and more exotic things such as the log-linear multiplicative models or the Poisson’s regression. The merit of all these studies is that, based on the grouped measures, they allow us to know how many exposed or unexposed subjects have the effect, thus allowing the calculation of rates, attributable fractions, etc. Do not fear, we will not go into detail, but there is available bibliography for those who want to keep warm from head to feet.

To finish with the methodological aspects of the ecological studies, we will list some of its most characteristic biases, favored by the fact of using aggregate analysis units.

The most famous of all is the ecological bias, also known as ecological fallacy. This occurs when the grouped measure does not measure the biological effect at the individual level, in such a way that the individual inference made is erroneous. This bias became famous with the New England’s study that concluded that there was a relationship between chocolate consumption and Nobel prizes but the problem is that, apart from the funny of this example, the ecological fallacy is the main limitation of this type of studies.

Another bias that has some peculiarities in this type of studies is the confusion bias. In studies dealing with individual units, confusion occurs when the exposure variable is related to the effect and exposure, without being part of the causal relationship between the two. This ménage à trois is a bit more complex in ecological studies. The risk factor can behave similarly at the ecological level, but not at the individual level and vice versa, it is possible that confounding factors at the individual level do not produce confusion at the aggregate level. In any case, as in the rest of the studies, we must try to control the confounding factors, for which there are two fundamental approaches.

The first one, to include the possible confounding variables in the mathematical model as covariables and perform a multivariate analysis, with which it will be more complicated to study the effect. The second one, to adjust or standardize the rates of the effect by the confounding variables and perform the regression model with the adjusted rates. To be able to do this it is essential that all the variables introduced in the model have to be adjusted too to the same variable of confusion and that the covariances of the variables are known, which does not always happen. In any case, and it is not to discourage, many times we cannot be sure that the confounding factors have been adequately controlled, even using the most recent and sophisticated multilevel analysis techniques, since the origin can be in unknown characteristics about the distribution of data among groups.

Other gruesome aspects of ecological studies are the temporal ambiguity bias (we have already commented, it is often difficult to ensure that exposure precedes the effect) and collinearity (difficulty in assess the effects of two or more exposures that can occur simultaneous). In addition, although they are not specific to ecological studies, they are very susceptible to presenting information biases.

You can see that I was right at the beginning when I told you that ecological studies seem to me a lot of things, but simple. In any case, it is convenient to understand what their methodology is based on, because, with the development of new analysis techniques, they have gained in prestige and power and it is more than possible that we meet them more and more frequently.

But do not despair, the important thing for us, consumers of medical literature, is to understand how they work so that we can make a critical appraisal of the articles when we deal with them. Although, as far as I know, there are no checklists as structured as CASP has for other designs, the critical appraisal will be done following the usual general scheme according to our three pillars: validity, relevance and applicability.

The study of VALIDITY will be done in a similar way to other types of cross-sectional observational studies. The first thing will be to check that there is a clear definition of the population and the exposure or effect under study. The units of analysis and their level of aggregation will have to be clearly specified, as well as the methods of measuring the effect and exposure, the latter, as we already know, only in analytical studies.

The sample of the study should be representative, for which we will have to review the selection procedures, the inclusion and exclusion criteria and its size. These data will also influence the external validity of the results.

As in any observational study, the measurement of exposure and effect should be done blindly and independently, using valid instruments. The authors must present the data completely, taking into account if there are loses or out of range values. Finally, there must be a correct analysis of the results, with a control of the typical biases of these studies: ecological, information, confusion, temporal ambiguity and collinearity.

In the RELEVANCE section we can begin with a quantitative assessment, summarizing the most important result and reviewing the magnitude of the effect. We must search or calculate ourselves, if possible, the most appropriate impact measures: differences in incidence rates, attributable fraction in exposed, etc. If the authors do not offer this data, but do provide the regression model, it is possible to calculate the impact measurements from the multiplication coefficients of the independent variables of the model. I’m not going to put here the list of formulas for not making this post even more unfriendly, but you know that they exist in case one day you need them.

Then we will make a qualitative assessment of the results, trying to assess the clinical interest of the main outcome measure, the interest of the effect size and the impact it may have for the patient, the system or the Society.

We will finish this section with a comparative assessment (looking for similar studies and comparing the main outcome measure and other alternative measures) and an assessment of the relationship between benefits, risks and costs, as we would do with any other type of study.

Finally, we will consider the APPLICABILITY of the results in clinical practice, taking into account aspects such as adverse effects, economic cost, etc. We already know that the fact that the study is well done does not mean that we have to apply it obligatorily in our environment.

And here we are going to leave it for today. When you read or do an ecological study, be careful not to fall into the temptation of drawing causality conclusions. Regardless of the pitfalls that the ecological fallacy may have for you, ecological studies are observational, so they can be used to generate hypotheses of causality, but not to confirm them.

And now we’re leaving. I did not tell you who won the fight between King Kong and Godzilla so as not to be a spoiler, but surely the smartest of you have already imagined it. After all, and to its disgrace, only one of the two later traveled to New York. But that is another story…

Chocolate’s fallacy

White, black, filling, in ounces, to the cup, powdered, in ice cream, with hazelnuts, with almonds, with fruits, milky, pure, fondant, bitter, in pies, in candy, in hot or cold drinks, etc., etc., etc. I like them all.

chocolate_nobelSo you can easily imagine my joy when my RSS reader showed me the title of the article in the New England saying that there was a relationship between chocolate consumption and Nobel prizes. I could see myself eating chocolate galore with my copy of the paper in my pocket to shut the mouths of all who would come to spoil me the party saying that I was going over the top with calories, fat, sugar, or whatever. At the end of the day, what could be more important than working to get a Nobel Prize?

It’s at this point that you can also easily imagine my frustration when reading the work and seeing that the title was fishy. It turns out that it was an ecological study.

In the epidemiological studies that we’re most used to read, the units of analysis are often isolated elements. However, in ecological studies these units are formed with aggregates of individuals.

A synthetic measure of the frequency of association and the effect on individuals in each aggregate is calculated, showing at the end if there’s an association between exposure and effect among the different units.

There’re two types of ecological studies. At one end are those which study frequency measures, such us incidence, mortality, etc., looking for different geographical patterns that may be related to social, economic, genetic factors or whatever. On the other, we have those who study the variations in frequency over time in order to look for temporal trends and detect them, trying to explain their cause.

These studies are usually simple and quick to perform, and often are made from data which are previously available in records or yearbooks, so they are also usually not too expensive. The problem with ecological studies is that the fact that there is an association among the units of analysis does not necessarily mean that it also exists at the level of individuals. If we take this association for granted at individual level, we’ll run the risk of committing a sin that is known by the beautiful name of ecological fallacy.  You can get stuck comparing every variable you can think of to the frequency of a particular disease to find a significant association, but then it could be impossible to find a plausible mechanism to explain it. In our example, it could even be the case that, at the individual level, the more chocolate you eat the more brutalized your senses are, putting you away of the desired Nobel Prize.

And for those who do not believe me, we will see a totally absurd and invented example. Suppose we want to know if there is a relationship between watching television for more than four hours a day and to be a strict vegetarian. It turns out that we have data from three surveys in three cities, we will call A, B and C to not get us any more trouble.

falacyIf we calculate the prevalence of vegetarianism and tele addiction we’ll see that it’s 0.4 in A, 0.5 in B and 0.6 in C. It’s pretty clear, in cities where there are more addicted to the boob tube there are more strict vegetarians, which may indicate that the use of television is even more dangerous than previously thought.

But these are aggregate results. What happen at the individual level? We see that the odds ratios are 0.33 in A and C and 0.44 in B. So, surprisingly, even though in cities with more coach potatoes there are more vegetarians, people with coach potatoes stigma have a 33-44% less chance of being a strict vegetarian. So we see how important it is that the results of an ecological study are subsequently investigated with other designs of analytical studies to explain them properly.

Only two issues more before ending this post. First, let’s vegetarians forgive me, even if they are strict, and, why not?, also forgive me those who watch TV for too long. Second, we have seen the fallacy of chocolate is actually an ecological fallacy. But, even in the cases that data were extracted from individual units, we must always remember that neither correlation nor association is synonymous with causality. But that’s another story…