Critical appraisal of etiology/harm studies
It seems like only yesterday. I began my adventures at the hospital and had my first contacts with The Patient. And, by the way, I didn’t know much about diseases but I knew without thinking about it what were the three questions with which any good clinical history began: what is bothering you?, how long has it been going on?, and to what do you attribute it?.
The fact is that the need to know the why of things is inherent to human nature and, of course, is of great importance in medicine. Everyone is mad for establishing cause and effect relations; sometimes one does it rather loosely and comes to the conclusion that the culprit of his summer’s cold is the supermarket’s guy, who has set the air conditioned at maximal power. This is the reason why studies on etiology must be conducted and assessed with scientific rigour. For this reason and because when we talk about etiology we also refer to harm, including that derived from our own actions (what educated people call iatrogenic).
Critical appraisal: etiology/harm
This is why studies on etiology/harm have similar designs. The clinical trial is the ideal choice and we can use it, for example, to know if a treatment is the cause of the patient’s recovery. But when we study risk factors or harmful exposures, the ethical principle of nonmaleficence prevent us to randomized exposures, so we have to resort to observational studies such us cohort studies or case-control studies, although the level of evidence provided by them will be smaller than that of the experimental studies.
To critically appraise a paper on etiology / harm, we’ll resort to our well-known pillars: validity, relevance and applicability.
First, we’ll focus on the VALIDITY or scientific rigour of the work, which should answer to the question whether the factor or intervention studied was the cause of the adverse effect or disease observed.
As always, we’ll asses a series of primary validity criteria. If these are not fulfilled, we’ll left the paper and devote ourselves to something else more profitable. The first is to determine whether groups compared were similar regarding to other important factors different from the exposure studied. Randomization in clinical trials provides that the groups are homogeneous, but we cannot count on it in the case of observational studies. The homogeneity of the two cohorts is essential and the study is not valid without it.
One can always argue that has stratified the differences between the two groups or that has made a multivariate analysis to control for the effect of known confounders but, what about the unknown?. The same applies to case-control studies, much more sensitive to bias and confusion.
Have exposure and effect been assessed in the same way in all groups?. In clinical trials and cohort studies we have to check that the effect has had the same likelihood of appearance and of be detected in the two groups. Moreover, in case-control studies is very important to properly asses previous exposure, so we must investigate whether there is potential bias in data collection, such us recall bias (patients often remember symptoms better than healthy). Finally, we must consider if follow-up has been long enough and complete. Losses during the study, common in observational designs, can bias the results.
If we have answered yes to all the three questions, we’ll turn to consider secondary validity criteria. Study’s results have to be evaluated to determine whether the association between exposure and effect satisfies a reasonably evidence of causality. One useful tool are the Hill’s criteria, which was a gentleman who suggested using a series of items to try to distinguish the causal or non-causal nature of an association.
These criteria are:
a) strength of association, represented by the risk ratio between exposure and effect, that we’ll consider shortly.
b) consistency, which is reproducibility in populations or in different situations.
c) specificity, which means that a cause produces a unique effect and no a multiple one.
d) temporality: it’s essential that cause precedes the effect.
e) biological gradient: the more intense the cause, the more intense the effect.
f) plausibility: the relationship has to be logical according to our biological knowledge.
g) coherence, the relationship should not be in conflict with other knowledge about disease or effect.
h) experimental evidence, often difficult to obtain in humans for ethical reasons.
i) analogy to other known situations. Although these are a quite-vintage criteria and some of them may be irrelevant (experimental evidence or analogy), they may serve as a guidance. The criterion of temporality would be a necessary one and would be well complemented with biological gradient, plausibility and coherence.
Another important aspect is to consider whether, apart from the intervention under study, both groups were treated similarly. In this type of study in which the double-blind is absent is where there is more risk of bias due to co-interventions, especially if these are treatments with a much greater effect than the exposure under study.
Regarding the RELEVANCE of the results, we must consider the magnitude and precision of the association between exposure and effect.
What was the strength of the association?. The most common measure of association is the risk ratio (RR), which can be used in trials and cohort studies. However, in case-control studies we don’t know the incidence of the effect (the effect has occurred when the study is conducted), so we used the odds ratio (OR). As we know, the interpretation of the two parameters is similar.
Even the values of the two are similar when the frequency of the effect is very low. However, the greater the magnitude or frequency of the effect, the more different RR and OR are, with the peculiarity that the OR tends to overestimate the strength of the association when it is greater than 1 and underestimate it when it is less than 1. Anyway, these vagaries of OR will exceptionally modify the qualitative interpretation of the results.
It has to be kept in mind that a test is statistically significant for any value of OR or RR whose confidence interval does not include one, but observational studies have to be a little more demanding. Thus, in a cohort study we’ll like to see values greater than or equal to three for RR and equal than or greater than four in case-control studies.
Another useful parameter (in trials and cohort studies) is the difference in risks or incidence difference, which is a fancy way of calling our known absolute risk reduction (ARR), which allows us to calculate the NNT (or NNH, number needed to harm) parameter that best quantifies us the clinical significance of the association. Also, similar to the relative risk reduction (RRR), we have the attributable fraction in the exposed, which is the percentage of risk observed in the exposed that is due to exposure.
And, what is the accuracy of the results?. As we know, we’ll use our beloved confidence intervals, which serve to determine the accuracy of the parameter estimate in the population. It is always useful to have all these parameters, which must be included in the study or its calculation should be possible from the data provided by the authors.
Finally, we’ll asses the APPLICABILITY of the results to our clinical practice.
Are the results applicable to our patients?. Search to see if there are differences that advise against extrapolating results of the work to our environment. Also, consider what is the magnitude of the risk in our patients based on the results of the study and their characteristics. And finally, having all this information in mind, we must think about our working conditions, the choices we have and the patient’s preferences to decide whether to avoid or not the studied exposure. For example, if the magnitude of the risk is high and we have an effective alternative, the decision will be clear, but things are not always so simple.
As always, I advise you to use the resources available on the Internet, such as CASP’s, both the design-specific templates and the calculator to assess the relevance of the results.
Before concluding, let me clarify one thing. Although we’ve said we use RR in cohort studies and clinical trials and we use OR in case-control studies, actually we can use OR in any type of study (not so for RR, for which we must know the incidence of the effect). The problem is that ORs are somewhat less accurate, so we prefer to use RR and NNT whenever possible. However, OR is increasingly popular for another reason, its use in logistic regression models, which allow us to obtain estimates adjusted for confounding variables. But that’s another story…