Which family you belong?

Classification of epidemiological studies

As we already know from previous posts, the evidence-based medicine systematics begins with a knowledge gap that moves us to ask a structured clinical question. Once we have elaborated the question, we will use its components to make a bibliographic search and obtain the best available evidence to solve our doubt.

And here comes, perhaps, the most feared task of evidence-based medicine: the critical appraisal of the evidence found. Actually, the thing is not so much since, with a little practice, the critical reading consists only of systematically applying a series of questions about the article that we are analyzing. The problem sometimes comes in knowing what questions we have to ask, since this system has differences according to the design of the study that we are evaluating.

What is an epidemiological design?

Here, by design we understand the set of procedures, methods and techniques used with the study participants, during the data collection and during the analysis and interpretation of the results to obtain the conclusions of the study. And there are a myriad of possible study designs, especially in recent times when epidemiologists have been led to design mixed observational studies. In addition, the terminology can sometimes be confusing and use terms that do not clarify well what is the design we have in front of us. It’s like when we get to a wedding of someone from a large family and we meet a cousin we do not know where it comes from. Even if we look for physical similarities, we will most likely end up asking him: and you, which family you belong? Only then will we know if he belongs to the groom or to the bride.

What we are going to do in this post is something similar. We will try to establish a series of criteria for classifying studies to finally establish a series of questions whose answers allow us to identify which family they belong to.

Structured clinical question

To begin with, the type of clinical question to which the work tries to answer can give us some guidance. If the question is of diagnostic nature, it is most likely that we will be faced with what is called a diagnostic test study, which is usually a design in which a series of participants are subjected, in a systematic and independent way, to the test in study and to the reference pattern (the gold standard). It is a type of design especially made for this type of questions but do not just take it from me: sometimes we can see diagnostic questions that can be tried to be solved with other types of studies.

If the question is about treatment, it is most likely that we are facing a clinical trial or, sometimes, a systematic review of clinical trials. However, there are not always trials on everything we look for and we may have to settle for an observational study, such as a case-control or a cohort study.

In case of questions of prognosis and etiology/harm we may find ourselves reading a clinical trial, but the most usual thing is that it is not possible to carry out trials and we only have observational studies.

Characteristics of epidemiological studies

Once analyzed this aspect, it is possible that we have doubts about the type of design we are facing. It will then be time to turn to our questions about six criteria related to the methodological design: general objective of the clinical question, direction of the study, type of sampling of the participants, temporality of the events, assignment of the study factors and units of study used. Let’s see in detail what each one of these six criteria means, which you see summarized in the table that I attach.

According to the objective, the studies can be descriptive or analytical. A descriptive study is one that, as the name suggests, only has the descriptive purpose of telling how things are, but without intending to establish causal relationships between the risk factor or exposure and the effect studied (a certain disease or health event, in most cases). These studies answer not very complex questions like how many? where? or to whom ?, so they are usually simple and they serve to elaborate hypotheses that later will need more complex studies for their demonstration.

By contrast, other analytical studies do try to establish such relationships, answering questions like why? how to deal with? or how to prevent? Logically, to establish such relationships it will need to have a group with which to compare (the control group). This will be a useful clue to distinguish between analytical and descriptive studies if we have any doubt: the presence of a comparison group will be typical of analytical studies.

The directionality of the study refers to the order in which the exposure and the effect of such exposure are investigated. The study will have an antegrade directionality when the exposure is studied before the effect and a retrograde directionality when the opposite is done. For example, if we want to investigate the effect of smoking on coronary mortality, we can take a set of smokers and see how many die of coronary diseases (antegrade) or, conversely, take a set of deaths from coronary heart disease and look to see how many smoked (retrograde). Logically, only studies with anterograde directionality can ensure that the exposure precedes the effect in time (I’m not saying that one is the cause of the other). Finally, to say that sometimes we can find studies in which exposure and effect are studied at the same time, talking then of simultaneous directionality.

The type of sampling has to do with how to select the study participants. These can be chosen because they are subject to the exposure factor that interests us, to having presented the effect or to a combination of the two or even other criteria other than exposure and effect.

Our fourth criterion is temporality, which refers to the relationship in time between the researcher and the exposure factor or the effect studied. A study will have a historical temporality when effect and exposure have already occurred when the study begins. On the other hand, when these events take place during the study, it will have a concurrent temporality. Sometimes the exposure can be historical and the effect concurrent, speaking then of mixed temporality.

Clarifying a pair of terms

Here I would like to make a point about two terms used by many authors and that will be more familiar to you: prospective and retrospective. Prospective studies would be those in which exposure and effect did not occur at the beginning of the study, while those in which the events have already occurred at the time of the study would be retrospective. To curl the curl, when both situations are combined we would talk about ambispective studies. The problem with these terms is that sometimes they are used indistinctly to express directionality or temporality, which are different terms. In addition, they are usually associated with specific designs: prospective with cohort studies and retrospective with case and control studies. It may be better to use the specific criteria of directionality and temporality, which express the aspects of the design more precisely.

Two other terms related to temporality are those of transversal and longitudinal studies. Transversals are those that provide us with a snapshot of how things are at a given moment, so they do not allow us to establish temporal or causal relationships. They tend to be prevalence studies and always of a descriptive nature.

On the other hand, in longitudinal studies variables are measured over a period of time, so they do allow establishing temporary relationships, but the researcher dos not control how the exposure is assigned to participants. These may have an antegrade (as in cohort studies) or retrograde (as in case and control studies) directionality.

The penultimate of the six criteria that we are going to take into account is the assignment of the study factors. In this sense, a study will be observational when the researchers are mere observers who do not act on the assignment of the exposure factors. In these cases, the relationship between exposure and effect may be affected by other factors, known as confusion, so they do not allow drawing conclusions about causality. On the other hand, when the researcher assigns the effect in a controlled manner according to a previous established protocol, we will talk about experimental or intervention studies. These experimental studies with randomization are the only ones that allow establishing cause-effect relationships and are, by definition, analytical studies.

The last of the criteria refers to the study units. The studies can be carried out on individual participants or on population groups. The latter are ecological studies and community trials, which have specific design characteristics.In the attached figure you can see a scheme of how to classify the different epidemiological designs according to these criteria. When you have doubts about which design corresponds to the work you are evaluating, follow this scheme. The first will be to decide if the study is observational or experimental. This is usually simple, so we move on to the next point. A descriptive observational (without a comparison group) will correspond to a series of cases or a cross-sectional study.

If the observational study is analytical, we will look at the type of sampling, which may be due by disease or study effect (case-control study) or by exposure to the risk or protection factor (cohort study).

Finally, if the study is experimental, we will look for if the exposure or intervention has been assigned randomly and with a comparison group. In the affirmative case, we will find ourselves in front of a randomized controlled clinical trial. If not, it is probably an uncontrolled trial or another type of quasi-experimental design.

We’re leaving…

And here we will stop for today. We have seen how to identify the most common types of methodological designs. But there are many more. Some with a very specific purpose and their own design, such as economic studies. And others that combine characteristics of basic designs, such as case-cohort studies or nested studies. But that is another story…

You can’t make a silk purse…

Propensity score

… of a sow’s ear. No, you can’t. As much as you try, it will remain a sow’s ear. And this is because the characteristics or defects of everyone cannot be avoided simply because one does external improvements. But, yes, it will look much more elegant.

In the world of biomedical studies in epidemiology there’s a type of design that doesn’t need to seem a silk purse. Of course, I’m talking about the king of kings, the randomized clinical trial, RCT for short.

Benefits of randomization

The RCT’s silk purse is randomization, which is nothing more than the unpredictable allocation of every trial participant to one of the alternative interventions, giving control to random so we cannot know which group will be assigned each participant to. Thus, it’s achieved that the characteristics of participants who can act as confounder or effect modifiers are equally distributed between the two intervention groups, so that if there’re differences between the groups under study we can say that the differences are due to the studied intervention, the only difference between the two groups.

On the other hand, observational studies lack of randomization, so we can never be sure that the observed differences are due to confounding variables that are even unknown to the researcher. Thus with cohort and case-control studies we cannot assert causality in the same way that can be established with the result of a RCT.

Multiples strategies have been invented to avoid this caveat of observational studies, such as stratification or logistic regression analysis, which allow estimating the effect of each variable on the outcome of the intervention in each group. We are going to talk now about one of these methods, the propensity score.

Let’s see if we can understand it with an example. Suppose we want to compare the duration of hospital admission of children with fildulastrosis according to the treatment they receive. We continue assuming that this terrible disease can be treated with pills or suppositories, being the preference of each doctor the criteria to choose one or another at the time of admission. We perform a retrospective study of the two cohorts and found that those who receive suppository are admitted five days longer on average than those receiving oral treatment. Can we conclude that the resolution is faster giving pills than suppositories? Because if we do so, we’ll run the risk of being wrong, because there may be other factors that we are not taking into account in addition to the treatment received.

In the case of a clinical trial, each participant has the same chance of receiving any of the treatments, so we can make a direct interpretation of the results. However, this is a cohort study, observational, and the risk of receiving pills or suppositories may depend on other factors. For example, a doctor may order suppositories to younger children, who are worse swallowing pills, while another doctor could not take into account this factor and give pills at all, because he prefers it. If age has nothing to do with the length of admission, we’ll be mixing the effect of treatment with the child’s age, comparing the suppositories of some of them (younger children) with the pills of the others (no age difference). And now think about one thing: if the probability of receiving either treatment varies in each participant, how are we to compare them without considering this chance? We have to compare those with a similar chance of receiving each treatment.

Propensity score

Well, here is where propensity score (PS) come into play, estimating the probability of each participant being given a treatment based on their characteristics.

PS is calculated using a logistic regression model with the intervention as the result and the covariates as predictors. Thus, an equation with each of the variables that we have included into the model because we think that they can act as confounding factors is obtained. For example, the probability of receiving the treatment to be equal to:

P(A) = β0 + β1a + β2b + β3c +….+ βnn,

Where P(A) is the probability of receiving A (actually, the model provides the natural logarithm of the odds ratio), the betas are the coefficients and a, b, c, …, n represent the model variables.

If we substitute the letters “a” to “n” by the characteristics of each participant, we get a score, which is the PS. And now we can compare with each other the participants of the two treatment arms with a similar score.

These comparisons can be done in several forms, being matching and stratification the simplest ones.

propensity scoreBy stratification, the participants are divided into groups with a range of scores and the groups are compared with each other to determine the effect of the intervention. By pairing, each participant of one group is compared to another having a score equal or, if it does not exist, similar (what is known as the nearest neighbor). In the figure you can see an example of pairing with the nearest neighbor of some of the participants in our fictitious study.

And this is what a PS is. A ploy to compare participants trying to avoid the effect of confounding variables and resemble the randomization of a RCT, becoming almost a quasi-experimental study. But as we had said, you can’t make a silk purse of a sow’s ear. For many variables we include into the regression model to calculate the PS, we can never be sure of having put all, as there may be confounding variables that we ignore. So it is always advisable to check the results of an observational study with the corresponding RCT.

We’re leaving…

And here we are done for today, although the PS go far more. For example, we talked only of matching and stratification, although there are more methods, more complex and less used in medicine, such as IP covariate adjustment or weighting by the inverse of the probability of receiving the intervention. But that is another story…

That’s not what it seems to be

I hope, for your own good, that you have never had to do with a situation in what you had to pronounce this sentence. And I hope, also for your good, that if you have had to pronounce it, the sentence wouldn’t have begun with the word “darling”. Would it?. Let’s leave it to the conscience of everyone.

What is true is that we have to ask ourselves this question in a much less scabrous situation: when assessing the results of a cross-sectional study. It goes without saying, of course, that in these cases there’s no use for the word “darling”.

Cross-sectional descriptive studies are a type of observational study in what we extract a sample from the population we want to study and then we measure the frequency of the disease or effect that we are interested in in the individuals of that sample. When we measure more than one variable this studies are called association cross-sectional studies and allow us to determine if there’s any kind of association among the variables.

But these studies have two characteristics that we must always keep in mind. First, they are prevalence studies that measure the frequency at a given time, so the result may vary depending on the timing of measuring the variable. Second, since the measurement is performed simultaneously, it is difficult to establish a cause-effect relationship, something that we all love to do. But it is something we should avoid doing because with this type of study, things are not always what they seem to be. Or rather, things can be a lot more things than what they seem.

What are we talking about?. Let’s consider an example. transversal_enI’m a little bored of going to the gym because I’m becoming more and more tired and my physical condition… well, just leave it that I get tired, so I want to study whether or not the effort can reward me with a better control of my body weight. Thus, I make a survey and get data from 1477 individuals approximately my age and ask them if they go to the gym (yes or no) and if they have a body mass index greater than 25 (yes or no). If you look closely at the results depicted in the table you’ll notice that the prevalence of overweight-obesity among those who go to the gym (50/751, about 7%) is higher than among those not going (21/726, about 3%).Oh my goodness!, I think, I not only get tired, but going to the gym I have twice the chance of being fat. Conclusion: I’ll leave the gym tomorrow.

Do you see how easy it is to reach an absurd (rather stupid, in this case) conclusion?. But the data are there, so we have to find an explanation to understand why they suggest something that goes against our common sense. And there are several possible explanations for these results.

The first is that going to the gym actually favors one fattening. It seems unlikely, but you never know … Imagine that working out motivates athletes to eat like wild beasts during the next six hours after a sports session.

The second is that obese going to the gym live longer than those who don’t go. Let’s think that exercise prevents death from cardiovascular disease in obese patients. It would explain why there are more obese (in proportion) in the gym than outside it: obese going to the gym die less that those not going. At the end of the day we are dealing with a prevalence study, so we see the final result at the time of measurement.

The third possibility is that the disease can influence the frequency of exposure, which is known as reverse causality. In our example, there could be more obese in the gym because the treatment recommendations they receive is doing it: to join a gym. This does not sound as ridiculous as the first one.

But we still have more possible explanations. So far we have tried to explain an association between the two variables that we have assumed as real. But what if the association is not real?. How can we get a false association between the two variables?. Again, we have three possible explanations.

First, our old friend: random. Some of you will tell me that we can calculate statistical significance or confidence intervals, but so what?. Even in the case of statistical significance, it only means that we can rule out the effect of random, but with some degree of uncertainty. Even with p < 0.05, there’s always a chance of committing a type I error, and erroneously reject the effect of chance. We can measure random, but never get rid of it.

The second is that we have committed some kind of bias that invalidates our results. Sometimes the disease’s characteristics can result in a different probability of choosing exposed and unexposed subjects, resulting in a selection bias. Imagine that instead of a survey (by telephone, for example) we have used a medical record. It may happen that obese going to the gym are more responsible with their health care and go to the doctor more than those that don’t go to the gym. In this situation, it will be more likely that we include obese athletes in the study, making a higher estimate of the true proportion. Sometimes the study factor may be somewhat stigmatizing from the social point of view, so diseased people will have less desire to participate in the study (and recognize their disease) that those who are healthy. In this case, we’ll underestimate the frequency of disease.

In our example, it may be that obese people who do not go to the gym answer to the survey lying about their true weight, which will be wrongly classified. This classification bias can occur randomly in the two groups of exposed and unexposed, thereby favoring the lack of association (the null hypothesis), and so the association will be underestimated, if it exists. The problem is when this error is systematic in one of the two groups, as this can both underestimate and overestimate the association between exposure and disease.

And finally, the third possibility is that there is a confounding variable that is distributed differently between exposed and unexposed. I can think that those who go to the gym are younger than those who don’t. I t is possible that younger obese are more likely to go to the gym. If we stratified the results by the confounding variable, age, we can determine its influence on the association.

To finish, I only want to apologize to all obese in the world for using them in the example but, for once, I wanted to let the smokers alone.

As you can see, things are not always what they seem at first glance, so the results should be interpreted with common sense and in the light of existing knowledge, without falling into the trap of establishing causal relationships from associations detected in observational studies. To stablish cause and effect we always need experimental studies, the paradigm of which is the clinical trial. But that’s another story…