Critical appraisal of prognostic studies
I wonder how many times I have heard this question or one of its many variants. Because it turns out that we are always thinking about clinical trials and clinical questions about diagnosis and treatment, but think about whether a patient ever asked you if the treatment you were proposing was endorsed by a randomized controlled trial that meets the criteria of the CONSORT statement and has a good score on the Jadad scale. I can say, at least, that it has never happened to me. But they do ask me daily what will happen to them in the future.
And here lies the relevance of prognostic studies. Note that you cannot always heal and that, unfortunately, many times all we can do is assist and relieve, if it is possible, the announcement of serious sequelae or death. But it is essential to have good quality information about the future of our patient’s disease. This information will also serve to calibrate therapeutic efforts in each situation depending on the risks and benefits. And besides, prognostic’s studies are used to compare results between different departments or hospitals. Nobody comes up saying that a hospital is worse than another because their mortality is higher without first checking that the prognosis of patients is similar.
Before getting into the critical appraisal of prognostic studies, let’s clarify the difference between risk factor and prognostic factor. The risk factor is a characteristic of the environment or the subject that favors the development of the disease, while the prognostic factor is that which, once the disease occurs, influences its evolution. Risk factor and prognostic factor are different things, although sometimes they can coincide. What the two do share is the same type of study design. The ideal would be to use clinical trials, but most of the time we cannot or are not ethical to randomize the prognostic or risk factors. Let’s think we want to demonstrate the deleterious effect of booze on the liver. The way with the highest degree of evidence to prove it would be to make two random groups of participants and give 10 whiskeys a day to the participants of one arm and some water to the participants of the other, to see the differences in liver damage after a year, for example. However, it is evident to anyone that we cannot do a clinical trial like this. Not because we cannot find subjects for the intervention arm, but because ethics and common sense prevent us from doing it.
For this reason, it is usual to use cohort studies: we would study what differences at the level of the liver there may be between individuals who drink and who do not drink alcohol by their own choice. In cases that require very long follow-ups or in which the effect we want to measure is very rare, case-control studies can be used, but they will always be less powerful because they have a higher risk of bias. Following our ethyl example, we would study people with and without liver damage and we would see if one of the two groups was exposed to alcohol.
A prognostic study should inform us of three aspects: what result we evaluate, how likely they are to happen, and in what time frame we expect it to happen. And to appraise it, as always, we will base on our three pillars: validity, relevance and applicability.
To assess the VALIDITY, we´ll first consider if the article meets a set of primary or elimination criteria. If the answer is not, we better throw the paper and go to read the last bullshit our Facebook’s friends have written on our wall.
Is the study sample well defined and is it representative of patients at a similar stage of disease? The sample, which is usually called initial or incipient cohort, should be formed by a group of patients at the same stage of disease, ideally at the beginning, at it should be followed-up prospectively. It should be well specified the type of patients included, the criteria for diagnosing them and the method of selection. We must also verify that the follow-up has been long enough and complete enough to observe the event we study. Each participant has to be followed-up from the start to the end of the study, either because he’s healed, because he presents the event or because the study ends. It is very important to take into account losses during the study, very common in designs with long follow-up. The study should provide the characteristics of patients lost and the reasons for the loss. If they are similar to those who are not lost during follow-up, we can get valid results. If the number of patients lost to follow-up is greater than 20% it’s usually done a sensitivity analysis using the worst possible scenario, which considers that all losses have had a poor prognosis and then recalculate the results to check if they are modified, in which case the study results could be invalidated.
Once these two aspect being assessed, we turn to the secondary criteria about internal validity or scientific rigor.
Were outcomes measured objectively and unbiased? It must be clearly specified what is being measured and how before starting the study. In addition, in order to avoid the information bias, the ideal is that the measure of results is done blinded to the researcher, who must not know whether the subject in question is subjected to any of the prognostic factors.
Were the results adjusted by all relevant prognostic values? We must take into account all the confounding variables and prognostic factors that may influence the results. In case they are known from previous studies, known factors may be considered. Otherwise, the authors will determine these effects using stratified data analysis (the easiest method) or multivariate analysis (the more powerful and complex), usually by a proportional hazards model or Cox regression analysis. Although we’re not going to talk about regression models now, there are two simple aspects that we can take into account. First, these models need a certain number of events per variable included in the model, so distrust those where many variables are analyzed, especially with small samples. Second, the variables included are decided by the author and are different from one work to another, so we will have to assess if they have not included any that may be relevant to the final result.
Were the results validated in other groups of patients? When we set groups of variables and we make multiple comparisons we risk the chance plays a trick on us and shows us associations that don’t exists. This is why when a risk factor is described in a group (training or derivation group), the results should be replicated in an independent group (validation group) to be really sure about the effect.
Now we must consider what the results are to determine their RELEVANCE. For this, we’ll check if the probability of the outcome of the study is estimated and provided by the authors, as well as the accuracy of this estimate and the risk associated with the factors influencing the prognosis.
Is the probability of the event specified in a given period of time? There are several ways to present the number of events occurring during the follow-up period. The simplest would be to provide an incidence rate (events / person / unit time) or the cumulative frequency at any given time. Another indicator is the median survival, which is just the moment at follow-up in which the event has happened in half of the cohort participants (remember that although we speak about survival, the event not need tro be necessarily death).
We can use survival curves of various kinds to determine the probability of the occurrence of the event in each period and the rate at which it is presenting. Actuarial or life tables are used for larger samples when we don’t know the exact time of the event and we use fixed time periods. However, the more often used are the Kaplan-Meier curves, which better measure the probability of the event for each particular time with smaller samples. This method can provide hazard ratios and median survival, as well as other parameter accor4ding to the regression model used.
To assess the accuracy of the results will look, as always, for the confidence intervals. The larger the interval, the less accurate the estimate of the probability of occurrence in the general population, which is what we really want to know. Keep in mind that the number of patients is generally lower as time passes, so it is usual that the survival curves are more accurate at the beginning than at the end of follow up. Finally, we’ll assess the factors that modify the prognosis. The right thing is to represent all the variables that may influence the prognosis with its corresponding relative risks, which will allow us to evaluate the clinical significance of the association.
Finally, we must consider the APPLICABILITY of the results. Do they apply to my patients? We will look for similarities between the study patients and ours and assess whether the differences we find allow us to extrapolate the results to our practice. But besides, are the results useful? The fact that they’re applicable doesn’t necessarily mean that we have to implement them. We have to assess carefully if they’re going to help us to decide what treatment to apply and how to inform our patients and their families.
As always, I recommend you to use a template, such as those provided by CASP, for systematically critical appraisal without leaving any important matter without assessing.
You can see that articles about prognosis have a lot of to say. And we haven’t almost talked about regression models and survival curves, which are often the statistical core of this type of articles. But that’s another story…