Science without sense…double nonsense

Píldoras sobre medicina basada en pruebas

Archive for the Critical appraisal Category

There is another world, and it is this one

Print Friendly, PDF & Email

And there are other lives, but they are in you. It was already said by Paul Éluard, that last century’s surrealist who had the bad idea of ​​visiting Cadaqués accompanied by his wife, Elena Ivanovna Diakonova, better known as Gala. He was not very clever there, but his phrase did give for many more things.

For example, it has been used by many writers who love the unknown, myths and mystery. I personally knew the phrase when I was a young teenager because it was written as a preface to a series of science fiction books. Even, in more recent times, it is related to that other incorporeal world that is cyberspace, where we spend a more and more greater part of our life.

But, to help Éluard rest peacefully in his tomb at Père- Lachaise, I’ll tell you that I like more his original idea about our two worlds, between which we can share our limited life time: the real world, where we make the most part of the things, and the world of the imagination, our intimate space, where we dream our most impossible realities.

You will think that today I am very metaphysical, but this is the thought that has come to my mind when I started thinking about the topic that we are going to deal with in this post. And the fact is that in the realm of medicine there are two worlds too.

We are very used to numbers and the objective results of our quantitative research. As an example, we have our revered systematic reviews, which gather the scientific evidence available on a specific health technology to assess its efficacy, safety, economic impact, etc. If we want to know if watching a lot of TV is a risk factor for suffering this terrible disease that is fildulastrosis, the best thing will be to do a systematic review of clinical trials (assuming there are any). Thus, we can calculate a multitude of parameters that, with a number, will give us a full idea of ​​the impact of such an unhealthy habit.

But if what we want to know is how fildulastrosis affects the person who suffers it, how much unhappiness it produces, how it alters family and social life, things get a little complicated with this type of research methodology. And this is important because the social and cultural aspects related to the real context of people are increasingly valued. Luckily, there are other worlds and they are in this one. I am referring to the world of qualitative research. Today we are going to take a look (a short one) at this world.

Qualitative research is a method that studies reality in its natural context, as it occurs, in order to interpret the phenomena according to the meanings they have for the people involved. And for this it uses all kinds of sources and materials that help us to describe the routine and the meaning of problematic situations for people’s lives: interviews, life stories, images, sounds … Although all this has nothing to do with the gridded world of quantitative research, both methods are not incompatible and may even be complementary. Simply, qualitative methods provide alternative information, different and complementary to that of quantitative methods, which is useful for evaluating the perspectives of the people involved in the problem we are studying. Quantitative research is a way to address the problem deductively, while qualitative uses an inductive approach.

Logically, the methods used by qualitative research are different from quantitative’s ones. In addition, they are numerous, so we will not describe them in depth. We will say that the specific methods most used are meta-synthesis, phenomenology, meta-ethnography, meta-study, meta-interpretation, the grounded theory, the biographical method and the aggregative review, among others.

The most frequently used of these methods is meta-synthesis, which starts with a research question and a bibliographic search, in a similar way to what we know about systematic reviews. However, there are a couple of important differences. In quantitative research, the research question must be clearly defined, while in qualitative research this question is, by definition, flexible and is usually modified and refined as data collection progresses. The other aspect has to do with the literature search, because in qualitative research it is not so clearly defined what databases have to be used and there are not the filters and methodologies available to documentarists to make revisions of quantitative research.

Also, techniques used for collecting data are different to those we are more accustomed to in quantitative research. One of them is observation, which allows the researcher to obtain information about the phenomenon as it occurs. The paradigm of observation in qualitative research is participant observation, in which the observer interacts socially with the subjects of the medium in which the phenomenon of study occurs. For example, if we want to assess the experiences of travelers on a commercial flight, nothing better than buying a ticket and posing as another traveler, collecting all the information about comfort, punctuality, attention provided by the flight staff, quality of the snacks, etc.

Another technique widely used is the interview, in which a person asks another people or group of people for information on a specific topic. When it is done to groups it is called, as it could not be otherwise, group interview. In this case the script is quite closed and the role of the interviewer is quite prominent, unlike in focus groups discussion, in which everything can be more open, at the discretion of the group’s facilitator. Anyway, when we want to know the opinion of many people, we can resort to the questionnaire technique, which polls the opinion of large groups so that each component of the group spends a minimum time to complete it, unlike the focus groups, in the that all remain throughout the interview time.

The structure of a qualitative research study usually includes five fundamental steps, which can be influenced according to the methods and techniques used:

  1. Definition of the problem. As we have already mentioned when discussing the research question, the definition of the problem has a certain degree of provisionality and can change throughout the study, since one of the objectives may be to find out precisely if the definition of the problem is well done.
  2. Study design. It must also be flexible. The problem with this phase is that there are times when the proposed design is not what we see in the published article. There is still a certain lack of definition of many methodological aspects, especially when compared with the methodology of quantitative research.
  3. Data collection. The techniques we have discussed are used: interview, observation, reading of texts, etc.
  4. Analysis of the data. This aspect also differs from the quantitative analysis. Here it will be interesting to unravel the meaning structures of the collected data to determine their scope and social implications. Although methods are being devised to express in numerical form, the usual thing is that we do not see many figures here and, of course, nothing to do with quantitative methods.
  5. Report and validation of the information. The objective is to generate conceptual interpretations of the facts to get a sense of the meaning they have for the people involved. Again, and unlike with quantitative research, the goal is not to project the results of possible interventions on the environment, but to interpret facts that are at hand.

At this point, what can we say about the critical appraisal of qualitative research? Well, to give you an idea, I will tell you that there is a great variety in opinions on this subject, from those who think that it makes no sense to evaluate the quality of a qualitative study to those who try to design evaluation instruments that provide numerical results similar to those of quantitative studies. So, my friends, there is no uniform consensus on whether you should evaluate, in the first place, or on how, in the second. In addition, some people think that even studies that can be considered of low quality should be taken into account because, after all, who is able to define with certainty what a good qualitative research study is?

In general, when we make a critical appraisal of a qualitative research study, we will have to assess a series of aspects such as its integrity, complexity, creativity, validity of the data, quality of the descriptive narrative, the interpretation of the results and the scope of its conclusions. We are going to continue here our habit of resorting to the CASPe’s critical appraisal program, which provides us with a template with 10 questions to perform the critical appraisal of a qualitative study. These questions are structured in three pillars: rigor, credibility and relevance.

The questions of rigor refer to the suitability of the methods used to answer the clinical question. As usual, the first questions are about elimination. If the answer is not affirmative, we will have resolved the controversy since, at least with this study, it will not be worthwhile to continue with our assessment. Were the objectives of the research clearly defined? It is necessary to value that the question is well specified, as well as the objective of the investigation and the justification of its necessity. Is the qualitative methodology congruent? We will have to decide if the methods used by the authors are adequate to obtain the data that will allow them to reach the objective of the investigation. Finally, is the research method used suitable for achieving the objectives? Researchers must explicitly say the method they have used (meta-synthesis, grounded theory …). In addition, the specified method must match the one used, which sometimes may not be the case.

If we have answered affirmatively to these three questions, it will be worth continuing and we will move on to the detailed questions. Is the participant selection strategy consistent with the research question and the method used? It must be justified why the selected participants were the most suitable, as well as explain who called them, where, etc. Are data collection techniques used congruent with the research question and the method used? The technique of collecting data (for example, discussion groups) and the registration format will have to be specified and justified. If the collection strategy is modified throughout the study, the reason for this will have to be justified.

Have the relationship between the researcher and the object of research (reflexivity) been considered? It will be necessary to consider if the involvement of the researcher in the process has been able to bias the data obtained and if this has been taken into account when designing the data collection, the selection of the participants and the scope of the study. To finish with the assessment of the rigor of the work, we will ask ourselves if the ethical aspects have been taken into account. It will be necessary to take into account common aspects with quantitative research, such as informed consent, approval by ethical committee or confidentiality of data, as well as specific aspects about the effect of the study on participants before and after its completion.

The next block of two questions has to do with the credibility of the study, which is related to the ability of the results to represent the phenomenon from the subjective point of view of the participants. The first question makes us think if the analysis of the data was sufficiently rigorous. The entire analysis process should be described, the categories that may have arisen from the collected data, if the subjectivity of the researcher has been assessed and how the data that could be contradictory to each other has been handled. In the case that fragments of testimonies of participants are presented to elaborate the results, the reference of their origin must be clearly specified. The second question has to do with whether the exposure of the results was made clearly. They should be presented in a detailed and understandable manner, showing their relationship to the research question. We will review at this point the strategies adopted to ensure the credibility of the results, as well as if the authors have reflected on the limitations of the study.

We will finish the critical assessment by answering the only question of the block that has to do with the relevance of the study, which is nothing more than its usefulness or applicability to our clinical practice. Are the results of the investigation applicable? We will have to assess how the results contribute to our practice, how they contribute to the existing knowledge and in what contexts may they be applicable.

And here we are going to leave it for today. You have already seen that we have taken a look into a world quite different from the one we are more used to, in which we have to change a little the mentality of how to pose and study problems. Before leaving, I have to warn you, as in previous posts, to not to look for fildulastrosis, because you will not find this disease anywhere. Actually, fildulastrosis is an invention of mine in homage to a very illustrious character, sadly deceased: Forges. Antonio Fraguas (from the English translation of his last name comes his nom de guerre) was, in my humble opinion, the best graphic humorist since I have conscience. For many years I began the day seeing the daily Forges’ joke, so since some time there are mornings that one does not know how to start the day. Forges had many own invented words and I really liked his percutoria’s fildulastro, who had the defect of escalporning now and then. Hence comes my fildulastrosis, so from here I thank him and I give him this little tribute.

And now we’re leaving. We have not talked much about other methods of qualitative research such as grounded theory, meta- ethnogarphy, etc. Those interested have bibliography where they are explained in a better way than I could do it. And, of course, as in quantitative research, there are also ways to combine qualitative research studies. But that is another story…

Powerful gentleman

Print Friendly, PDF & Email

Yes, as the illustrious Francisco de Quevedo y Villegas once said, powerful gentleman is Don Dinero (Mr. Money). A great truth because, who, purely in love, does not humble himself before the golden yellow? And even more in a mercantilist and materialist society like ours.

But the problem is not that we are materialistic and just think about money. The problem is that nobody believes they have all the money they need. Even the wealthiest would like to have much more money. And many times, it is true, we do not have enough money to cover all our needs as we would like.

And that does not only happen at the individual’s level, but also at social groups level. Any country has a limited amount of money, which is why you cannot spend everything you want and you have to choose where you spend your money. Let’s think, for example, of our healthcare system, in which new health technologies (new treatments, new diagnostic techniques, etc. ) are getting better … and more expensive (sometimes, even bordering on obscenity). If we are spending at the limit of our possibilities and want to apply a new treatment, we only have two choices: either we increase our wealth (where do we get the money from?) or we stop spending it on something else. There would be a third one that is used frequently, even if it is not the right thing to do: spend what we do not have and pass on the debt to whoever comes next.

Yes, my friends, the saying that Health is priceless does not hold up economically. Resources are always limited and we must all be aware of the so-called opportunity cost of a product: the price it costs, the money will have to stop spending on something else.

Therefore, it is very important to properly evaluate any new health technology before deciding its implementation in the health system, and this is why the so-called economic evaluation studies have been developed, aimed at identifying what actions should be prioritized to maximize the benefits produced in an environment with limited resources. These studies are a tool to assist in decision-making, but are not aim to replace it, so other elements have to be taken into account, such as justice, equity and free access to the election.

The economic evaluation (EV) studies encompass a whole series of methodology and specific terminology that is usually little known by those who are not dedicated to the evaluation of health technologies. Let’s briefly review its characteristics to finally give some recommendations on how to make a critical appraisal of these studies.

The first thing would be to explain what are the two characteristics that define an EV. These are the measure of the costs and benefits of the interventions (the first one) and the choice or comparison between two or more alternatives (the second one). These two features are essential to say that we are facing an EV, which can be defined as the comparative analysis of different health interventions in terms of costs and benefits. The methodology of development of an EV will have to take into account a number of aspects that we list below and that you can see summarized in the attached table.

– Objective of the study. It will be determined if the use of a new technology is justified in terms of the benefits it produces. For this, a research question will be formulated with a structure similar to that of other types of epidemiological studies.

– Perspectives of the analysis. It is the point of view of the person or institution to whom the analysis is targeted, which will include the costs and benefits that must be taken into account from the positioning chosen. The most global perspective is that of the Society, although the one of the funders, that of specific organizations (for example, hospitals) or that of patients and families can also be adopted. The most usual is to adopt the perspective of the funders, sometimes accompanied by the social one. If so, both must be well differentiated.

– Time horizon of the analysis. It is the period of time during which the main economic and health effects of the intervention are evaluated.

– Choice of the comparator. It is a crucial point to be able to determine the incremental effectiveness of the new technology and on which the importance of the study for the decision makers will largely depend. In practice, the most commonly comparator is the alternative that is commonly used (the gold standard), although it can sometimes be compared with the non-treatment option, which must be justified.

– Identification of costs. Costs are usually considered taking into account the total amount of the resource consumed and the monetary value of the resource unit (you know, as the friendly hostesses of an old TV contest said: 25 responses, at 5 pesetas each, 125 pesetas). The costs are classified as direct and indirect and as sanitary and non-sanitary. The direct ones are those clearly related to the illness (hospitalization, laboratory tests, laundry and kitchen, etc.), while the indirect refer to productivity or its loss (work functionality, mortality). On the other hand, health costs are those related to the intervention (medicines, diagnostic tests, etc.), while non-health costs are those that the patient or other entities have to pay or those related to productivity.

What costs will be included in an EV? It will depend on the intervention being analyzed and, especially, on the perspective and time horizon of the analysis.

 Quantification of costs. It will be necessary to determine the amount of resources used, either individually or in aggregate, depending on the information available.

– Cost assessment. They will be assigned a unit price, specifying the source and the method used to assign this price. When the study covers long periods of time, it must be borne in mind that things do not cost the same over the years. If I tell you that I knew a time when you went out at night with a thousand pesetas (the equivalent of about 6 euros now) and came back home with money in your pocket, you will think it is another of my frequent ravings, but I swear it is true.

To take this into account, a weighting factor or discount rate is used, which is usually between 3% and 6%. For who is curious, the general formula is CV = FV / (1 + d) n, where CV is the current value, FV future value, n is the number of years and d the discount rate.

 Identification, measurement and evaluation of results. The benefits obtained can be classified into health and non-health ones. Health benefits are clinical consequences of the intervention, generally measured from a point of view of interest to the patient (improvement of blood pressure figures, deaths avoided, etc.). On the other hand, the non-health ones are divided as they cause improvements in productivity or in the quality of life.

The first ones are easy to understand: productivity can improve because people go to work earlier (shorter hospitalization, shorter convalescence) or because they work better to improve the health conditions of the worker. The second ones are related to the concept of quality of life related to health, which reflects the impact of the disease and its treatment on the patient.

The quality of life related to health can be estimated using a series of questionnaires on the preferences of patients, summarized in a single score value that, together with the amount of life, will provide us with the quality-adjusted life year (QALY).

To assess the quality of life we ​​refer to the utilities of the health states, which are expressed with a numerical value between 0 and 1, in which 0 represents the utility of the state of death and 1 that of perfect health. In this sense, a year of life lived in perfect health is equivalent to 1 QALY (1 year of life x 1 utility = 1 QALY). Thus, to determine the value in QALYs we will multiply the value associated with a state of health by the years lived in that state. For example, half a year in perfect health (0.5 years x 1 utility) would be equivalent to one year with some ailments (1 year x 0.5 utility).

 Type of economic analysis. We can choose between four types of economic analysis.

The first, the cost minimization analysis. This is used when there is no difference in effect between the two options compared, situation in which will be enough to compare the costs to choose the cheapest. The second, the cost-effectiveness analysis. This is used when the interventions are similar and determines the relationship between costs and consequences of interventions in units usually used in clinical practice (decrease in days of admission, for example). The third, the cost-utility analysis. It is similar to cost-effectiveness, but the effectiveness is adjusted for quality of life, so the outcome is the QALY. Finally, the fourth method is the cost-benefit analysis. In this type everything is measured in monetary units, which we usually understand quite well, although it can be a little complicated to explain with them the gains in health.

 Analysis of results. The analysis will depend on the type of economic analysis used. In the case of cost-effectiveness studies, it is typical to calculate two measures, the average cost-effectiveness (dividing the cost between the benefit) and the incremental cost-effectiveness (the extra cost per unit of additional benefit obtained with an option with respect to the other). This last parameter is important, since it constitutes a limit of efficiency of the intervention, which we will be chosen or not depending on how much we are willing to pay for an additional unit of effectiveness.

– Sensitivity analysis. As with other types of designs, EVs do not get rid off uncertainty, generally due to lack of reliability of the available data. Therefore, it is convenient to evaluate the degree of uncertainty through a sensitivity analysis to check the degree of stability of the results and how they can be modified if the main variables vary. An example may be the variation of the discount rate chosen.

There are five types of sensitivity analysis: univariate (the study variables are modified one by one), multivariate (two or more are modified), extremes (we put ourselves in the most optimistic and most pessimistic scenarios for the intervention), threshold (identifies if there is a critical value above or below which the choice is reversed towards one or the other the interventions compared) and probabilistic (assuming a certain probability distribution for the uncertainty of the parameters used).

 Conclusion. This is the last section of the development of an EV. The conclusions should take into account two aspects: internal validity (correct analysis for patients included in the study) and external validity (possibility of extrapolating the conclusions to other groups of similar patients).

As we said at the beginning of this post, EVs have a lot of jargon and its own methodological aspects, which makes it difficult for us to make a critical appraising and a correct understanding of its content. But let no one get discouraged, we can do it by relying on our three basic pillars: validity, relevance and applicability.

There are multiple guides that systematically explain how to assess an EV. Perhaps the first to appear was that of the British NICE (National Institute for Clinical Excellence), but subsequently others have arisen such as that of the Australian PBAC (Pharmaceutical Benefits Advisory Committee) and that of the Canadian CADTH (Canadian Agency for Drugs and Technologies in Health). In Spain we could not be less and the Laín Entralgo’s Health Technology Assessment Unit also developed an instrument to determine the quality of an EV. This guide establishes recommendations for 17 domains that closely resemble what we have said so far, completing with a checklist to facilitate the assessment of the quality of the EV.

Anyway, as my usual sufferers know, I prefer to use a simpler checklist that is available on the Internet for free, which is none other than the tool provided by the CASPe group and that you can download from their website. We are going to follow these 11 CASPe’s questions, although without losing sight of the recommendations of the Spanish guide that we have mentioned.

As always, we will start with the VALIDITY, trying to answer first two elimination questions. If the answer is negative, we can leave the study aside and dedicate ourselves to another more productive task.

Is the question or objective of the evaluation well defined? The research question should be clear and define the target population of the study. There will also be three fundamental aspects that should be clear in the objective: the options compared, the perspective of the analysis and the time horizon. Is there a sufficient description of all possible alternatives and their consequences? The actions to follow must be perfectly defined in all the compared options, including who, where and to whom each action is applied. The usual will be to compare the new technology, at least, with the one of habitual use, always justifying the choice of the comparison technology, especially if this is the non-treatment one (in the case of pharmacological interventions).

If we have been able to answer these two questions affirmatively, we will move on to the four questions of detail. Are there evidence of the effectiveness, of the intervention or of the evaluated program? We will see if there are trials, reviews or other previous studies that prove the effectiveness of the interventions. Think of a cost minimization study, in which we want to know which of the two options, both effective, is cheaper. Logically, we will have to have prior evidence of this effectiveness. Are the effects of the intervention (or interventions) identified, measured and appropriately valued or considered? These effects can be measured with simple units, often derived from clinical practice, with monetary units and more elaborate calculation units, such as the QALYs mentioned above. Are the costs incurred by the intervention (interventions) identified, measured and appropriately valued? The resources used must be well identified and measured in the appropriate units. The method and source used to assign the value to the resources used must be specified, as we have already mentioned. Finally, were discount rates applied to the costs of the intervention/s? And to the effects? As we already know, this is fundamental when the time horizon of the study is prolonged. In Spain, it is recommended to use a discount rate of 3% for basic resources. When doing sensitivity analysis this rate will be tested between 0% and 5%, which will allow comparison with other studies.

Once assessed the internal validity of our EV, we will answer the questions regarding the RELEVANCE of the results. Firstly, what are the evaluation results? We will review the units that have been used (QALYs, monetary costs, etc.) and if the incremental benefits analysis have been carried out, in appropriate cases. The second question in this section refers to whether an adequate sensitivity analysis has been carried out to know how the results would vary with changes in costs or effectiveness. In addition, it is recommended that the authors justify the modifications made with respect to the base case, the choice of the variables that are modified and the method used in the sensitivity analysis. Our Spanish guide recommends carrying out, whenever possible, a probabilistic sensitivity analysis, detailing all the statistical tests performed and the confidence intervals of the results.

Finally, we will assess the Cost-efeor external validity of our study by answering the last three questions. Would the program be equally effective in your environment? It will be necessary to consider if the target population, the perspective, the availability of technologies, etc., are applicable to our clinical context. Finally, we must reflect on whether the costs would be transferable to our environment and if it would be worth applying them to our environment. This may depend on social, political, economic, population, etc. differences, between our environment and that in which the study has been carried out.

And with this we are going to finish this post for today. Even if I blow your mind after all we have said, you can believe me if I tell you that we have done nothing but scratch the surface of this stormy world of economic valuation studies. We have not discussed anything, for example, about the statistical methods that can be used in studies of sensitivity, which can become complicated, nor about the studies using modeling, employing techniques only available to privileged minds, like Markov chains, stochastic models or discrete event simulation models, to name a few. Neither have we talked about the type of studies on which economic evaluations are based.  These can be experimental or observational studies, but they have a series of peculiarities that differentiate them from other studies of similar design, but with different functions. This is the case of clinical trials that incorporate an economic evaluation (also known as piggy -back clinical trials , which tend to have a more pragmatic design than conventional trials. But that is another story…

King Kong versus Godzilla

Print Friendly, PDF & Email

What a mess these two elements make when they are left loose and come together! In this story, almost as old as me (please, do not run to look at what year the movie was made) poor King Kong, who must have traveled more than Tarzan, leaves his Skull Island to defend a village from an evil giant octopus and drinks a potion that leaves him sound asleep. Then, some Japanese gentlemen seized the opportunity to take him to their country. I, who have visited Japan, can imagine the effect it produced on the poor monkey when he woke up, so it had no choice but to escape, with the misfortune of meeting Godzilla, who had also escaped from an iceberg where it had been previously frozen. And there they are bundled and the fight begins, stones over here, atomic rays over there, until the thing gets out of control and finally King Kong is going to attack Tokyo, I do not remember exactly for what reason. I swear I have not taken any hallucinogenic, the film is like that and I will not reveal more for not spoiling the end in the incredible case that you want to see the film after what I have told you. What I do not know is what the screenwriters would have taken before planning this story.

At this point you will be thinking about how today’s post may be related to this story. Well, the truth is that it has nothing to do with what we are going to talk about, but I could not think of a better way to start. Well, it may actually be related, because today we are going to talk about a family of monsters within epidemiological studies: the ecological studies. It’s funny that when you read something about ecological studies, it always starts by saying that they are simple. Well, I do not think so. The truth is that they have a lot to get our teeth into and we are going to try to explain them in a simple way. I thank my friend Eduardo (to whom I dedicate this post) for the effort he made to describe them intelligibly. Thanks to him I could understand them. Well… a little bit.

Ecological studies are observational studies that have the peculiarity that the study population are not individual subjects, but grouped subjects (in conglomerates), so the level of inference of their estimates is also aggregated. They tend to be cheap and quick to perform (I suppose that hence its supposed simplicity), since they usually use data from secondary sources already available, and are very useful when it is not possible to measure the exposure at the individual level or when the measurement of the effect can only be measured at the population level (such as the results of a vaccination campaign, for example).

The problem comes when we want to make inferences at the individual level based on their results, since they are subject to a series of biases that we will comment later on. In addition, since they use to be descriptive studies of historical temporality, it can be difficult to determine the temporal gradation between the exposure and the effect studied.

We will look at the specific characteristics in relation to three aspects of its methodology: types of variables and analyzes, types of studies and biases.

Ecological variables are classified in aggregate and environmental variables (also called global variables). The aggregate ones show a summary of individual observations. They are usually averages or proportions, such as the mean age at which the first King Kong’s movie is seen or the rate of geeks for every 1000 moviegoers, to name two absurd examples.

On the other hand, environmental measures are characteristic of a specific place. These can have a parallelism at an individual level (for example, the levels of environmental pollution, related to the crap that each swallows) or be attributes of groups without equivalence at the individual level (such as water quality, to say the least).

As for the analysis, it can be done at the aggregate level, using data from groups of participants, or at the individual level, but better without mixing the two types. Moreover, if data of both types is collected, it will be more convenient to transform them into a single level, the simplest being to aggregate the individual data, although it can also be done the other way around and, even, make an analysis in the two levels with techniques of hierarchical multilevel statistics, only afforded by a few privileged minds.

Obviously, the level of inference we want to apply will depend on what our objective is. If we want to study the effects of a risk factor at the individual level, the inference will be individual. An example would be to study the relationship between the number of hours television is watched and the incidence of brain cancer. On the other hand, and following a very pediatric example, if we want to know the effectiveness of a vaccine, the inferences will be made in an aggregated form from the data of vaccination coverage in the population. And to finish curling the curl, we can measure an exposure factor of the two forms, individual and grouped. For example, density of Mexican restaurants in a population and frequency of antacids intake. In this case we would make a contextual inference.

Regarding the type of ecological studies, we can classify them according to the exposure method and the grouping method.

According to the exposure method, the thing is relatively simple and we can find two types of studies. If we do not measure the exposure variable, or we do it partially, we talk about exploratory studies. In the opposite case, we will find ourselves before an analytical study.

According to the grouping method, we can consider three types: multiple (when multiple zones are selected), temporary (there is measurement over time) and mixed (combination of both).

The complexity begins when the two dimensions (exposure and grouping) are combined, since then we can find ourselves before a series of more complex designs. Thus, multiple group studies can be exploratory (the exposure factor is not measured, but the effect is measured) or analytical studies (the most frequent, we measure both here). The studies of temporal tendency, to not be less, can also be exploratory and analytical, in a similar way to the previous ones, but with a temporal trend. Finally, there will be mixed studies that compare the temporal trends of several geographical areas. Simple, isn’t it?

Well, this is nothing compared to the complexity of the statistical techniques used in these studies. Until recently the analyzes were very simple and based on measures of association or linear correlation, but in recent times we have seen the development of numerous techniques based on regression models and more exotic things such as the log-linear multiplicative models or the Poisson’s regression. The merit of all these studies is that, based on the grouped measures, they allow us to know how many exposed or unexposed subjects have the effect, thus allowing the calculation of rates, attributable fractions, etc. Do not fear, we will not go into detail, but there is available bibliography for those who want to keep warm from head to feet.

To finish with the methodological aspects of the ecological studies, we will list some of its most characteristic biases, favored by the fact of using aggregate analysis units.

The most famous of all is the ecological bias, also known as ecological fallacy. This occurs when the grouped measure does not measure the biological effect at the individual level, in such a way that the individual inference made is erroneous. This bias became famous with the New England’s study that concluded that there was a relationship between chocolate consumption and Nobel prizes but the problem is that, apart from the funny of this example, the ecological fallacy is the main limitation of this type of studies.

Another bias that has some peculiarities in this type of studies is the confusion bias. In studies dealing with individual units, confusion occurs when the exposure variable is related to the effect and exposure, without being part of the causal relationship between the two. This ménage à trois is a bit more complex in ecological studies. The risk factor can behave similarly at the ecological level, but not at the individual level and vice versa, it is possible that confounding factors at the individual level do not produce confusion at the aggregate level. In any case, as in the rest of the studies, we must try to control the confounding factors, for which there are two fundamental approaches.

The first one, to include the possible confounding variables in the mathematical model as covariables and perform a multivariate analysis, with which it will be more complicated to study the effect. The second one, to adjust or standardize the rates of the effect by the confounding variables and perform the regression model with the adjusted rates. To be able to do this it is essential that all the variables introduced in the model have to be adjusted too to the same variable of confusion and that the covariances of the variables are known, which does not always happen. In any case, and it is not to discourage, many times we cannot be sure that the confounding factors have been adequately controlled, even using the most recent and sophisticated multilevel analysis techniques, since the origin can be in unknown characteristics about the distribution of data among groups.

Other gruesome aspects of ecological studies are the temporal ambiguity bias (we have already commented, it is often difficult to ensure that exposure precedes the effect) and collinearity (difficulty in assess the effects of two or more exposures that can occur simultaneous). In addition, although they are not specific to ecological studies, they are very susceptible to presenting information biases.

You can see that I was right at the beginning when I told you that ecological studies seem to me a lot of things, but simple. In any case, it is convenient to understand what their methodology is based on, because, with the development of new analysis techniques, they have gained in prestige and power and it is more than possible that we meet them more and more frequently.

But do not despair, the important thing for us, consumers of medical literature, is to understand how they work so that we can make a critical appraisal of the articles when we deal with them. Although, as far as I know, there are no checklists as structured as CASP has for other designs, the critical appraisal will be done following the usual general scheme according to our three pillars: validity, relevance and applicability.

The study of VALIDITY will be done in a similar way to other types of cross-sectional observational studies. The first thing will be to check that there is a clear definition of the population and the exposure or effect under study. The units of analysis and their level of aggregation will have to be clearly specified, as well as the methods of measuring the effect and exposure, the latter, as we already know, only in analytical studies.

The sample of the study should be representative, for which we will have to review the selection procedures, the inclusion and exclusion criteria and its size. These data will also influence the external validity of the results.

As in any observational study, the measurement of exposure and effect should be done blindly and independently, using valid instruments. The authors must present the data completely, taking into account if there are loses or out of range values. Finally, there must be a correct analysis of the results, with a control of the typical biases of these studies: ecological, information, confusion, temporal ambiguity and collinearity.

In the RELEVANCE section we can begin with a quantitative assessment, summarizing the most important result and reviewing the magnitude of the effect. We must search or calculate ourselves, if possible, the most appropriate impact measures: differences in incidence rates, attributable fraction in exposed, etc. If the authors do not offer this data, but do provide the regression model, it is possible to calculate the impact measurements from the multiplication coefficients of the independent variables of the model. I’m not going to put here the list of formulas for not making this post even more unfriendly, but you know that they exist in case one day you need them.

Then we will make a qualitative assessment of the results, trying to assess the clinical interest of the main outcome measure, the interest of the effect size and the impact it may have for the patient, the system or the Society.

We will finish this section with a comparative assessment (looking for similar studies and comparing the main outcome measure and other alternative measures) and an assessment of the relationship between benefits, risks and costs, as we would do with any other type of study.

Finally, we will consider the APPLICABILITY of the results in clinical practice, taking into account aspects such as adverse effects, economic cost, etc. We already know that the fact that the study is well done does not mean that we have to apply it obligatorily in our environment.

And here we are going to leave it for today. When you read or do an ecological study, be careful not to fall into the temptation of drawing causality conclusions. Regardless of the pitfalls that the ecological fallacy may have for you, ecological studies are observational, so they can be used to generate hypotheses of causality, but not to confirm them.

And now we’re leaving. I did not tell you who won the fight between King Kong and Godzilla so as not to be a spoiler, but surely the smartest of you have already imagined it. After all, and to its disgrace, only one of the two later traveled to New York. But that is another story…

The crystal ball

Print Friendly, PDF & Email

How I wish I could predict the future! And not only to win millions in the lottery, which is the first thing you can think of. There are more important things in life than money (or so that’s what some say), decisions that we make based on assumptions that end up not being fulfilled and that complicate our lives to unsuspected limits. We all have ever thought about “if you lived twice …” I have no doubt, if I met the genie of the lamp one of the three wishes I would ask would be a crystal ball to see the future.

And we could also do well in our work as doctors. In our day to day we are forced to make decisions about the diagnosis or prognosis of our patients and we always do it on the swampy terrain of uncertainty, always assuming the risk of making some mistake. We, especially when we are more experienced, estimate consciously or unconsciously the likelihood of our assumptions, which helps us in making diagnostic or therapeutic decisions. However, it would be good to also have a crystal ball to know more accurately the evolution of the patient’s course.

The problem, as with other inventions that would be very useful in medicine (like the time machine), is that nobody has yet managed to manufacture a crystal ball that really works. But do not let us down. We cannot know for sure what will happen, but we can estimate the probability that a certain result will occur.

For this, we can use all those variables related to the patient that have a known diagnostic or prognostic value and integrate them to perform the calculation of probabilities. Well, doing such a thing would be the same as designing and applying what is known as a clinical prediction rule (CPR).

Thus, if we get a little formal, we can define a CPR as a tool composed of a set of variables of clinical history, physical examination and basic complementary tests, which provides us with an estimate of the probability of an event, suggesting a diagnosis or predicting a concrete response to a treatment.

The critical appraisal of an article about a CPR shares similar aspects with those of the ones about diagnostic tests and also has specific aspects related to the methodology of its design and application. For this reason, we will briefly look at the methodological aspects of CPRs before entering into their critical assessment.

In the process of developing a CPR, the first thing to do is to define it. The four key elements are the study population, the variables that we will consider as potentially predictive, the gold or reference standard that classifies whether the event we want to predict occurs or not and the criterion of assessment of the result.

It must be borne in mind that the variables we choose must be clinically relevant, they must be collected accurately and, of course, they must be available at the time we want to apply the CPR for decision making. It is advisable not to fall into the temptation of putting variables everywhere and endlessly since, apart from complicating the application of the CPR, it can decrease its validity. In general, it is recommended that for every variable that is introduced in the model there should have been at least 10 events that we want to predict (the design is made in a certain sample whose components have the variables but only a certain number have ended up presenting the event to predict).

I would also like to highlight the importance of the gold standard. There must be a diagnostic test or a set of well-defined criteria that allow us to clearly define the event we want to predict with the CPR.

Finally, it is convenient that those who collect the variables during this definition phase are unaware of the results of the gold standard, and vice versa. The absence of blinding decreases the validity of the CPR.

The next step is the derivation or design phase itself. This is where the statistical methods that allow to include predictive variables and exclude those that are not going to contribute anything are applied. We will not go into statistics, just say that the most commonly used methods are those based on logistic regression, although discriminant, survival and even more exotic analysis based on discriminant risks or neural networks can be used, only afforded by a few virtuous ones.

In the logistic regression models, the event will be the dichotomous dependent variable (it happens or it does not happen) and the other variables will be the predictive or independent variables. Thus, each coefficient that multiplies each predictive variable will be the natural antilogarithm of the adjusted odds ratio. In case anyone has not understood, the adjusted odds ratio for each predictive variable will be calculated raising the number “e” to the value of the coefficient of that variable in the regression model.

The usual thing is that a certain score is assigned on a scale according to the weight of each variable, so that the total sum of points of all the predictive variables will allow to classify the patient in a specific range of prediction of event production. There are also other more complex methods using regression equations, but after all you always get the same thing: an individualized estimate of the probability of the event in a particular patient.

With this process we perform the categorization of patients in homogenous groups of probability, but we still need to know if this categorization is adjusted to reality or, what is the same, what is the capacity of discrimination of the CPR.

The overall validity or discrimination capacity of the PRC will be assess by contrasting its results with those of the gold standard, using similar techniques to those used to assess the power of diagnostic tests: sensitivity, specificity, predictive values and likelihood ratios. In addition, in cases where the CPR provides a quantitative estimate, we can resort to the use of the ROC curves, since the area under the curve will represent the global validity of the CPR.

The last step of the design phase will be the calibration of the CPR, which is nothing more than checking its good behavior throughout the range of possible results.

Some CPR’s authors end this here, but they forget two fundamental steps of the elaboration: the validation and the calculation of the clinical impact of the rule.

The validation consists in testing the CPR in samples different to the one used for its design. We can take a surprise and verify that a rule that works well in a certain sample does not work in another. Therefore, it must be tested, not only in similar patients (limited validation), but also in different clinical settings (broad validation), which will increase the external validity of the CPR.

The last phase is to check its clinical performance. This is where many CPRs crash down after having gone through all the previous steps (maybe that’s why this last check is often avoided). To assess the clinical impact, we will have to apply CPR in our patients and see how clinical outcome measures change such as survival, complications, costs, etc. The ideal way to analyze the clinical impact of a CPR is to conduct a clinical trial with two groups of patients managed with and without the rule.

For those self-sacrificing people who are still reading, now that we know what a CPR is and how it is designed, we will see how the critical appraisal of these works is done. And for this, as usual, we will use our three pillars: validity, relevance and applicability. To not forget anything, we will follow the questions that are listed on the grid for CRP studies of the CASP tool.

Regarding VALIDITY, we will start first with some elimination questions. If the answer is negative, it may be time to wait until someone finally makes up a crystal ball that works.

Does the rule answer a well-defined question? The population, the event to be predicted, the predictive variables and the outcome evaluation criteria must be clearly defined. If this is not done or these components do not fit our clinical scenario, the rule will not help us. The predictive variables must be clinically relevant, reliable and well defined in advance.

Did the study population from which the rule was derived include an adequate spectrum of patients? It must be verified that the method of patient selection is adequate and that the sample is representative. In addition, it must include patients from the entire spectrum of the disease. As with diagnostic tests, events may be easier to predict in certain groups, so there must be representatives of all of them. Finally, we must see if the sample was validated in a different group of patients. As we have already said, it is not enough that the rule works in the group of patients in which it has been derived, but that it must be tested in other groups that are similar or different from those with which it was generated.

If the answer to these three questions has been affirmative, we can move on to the three next questions. Was there a blind evaluation of the outcome and of the predictor variables? We have already commented, it is important that the person who collects the predictive variables does not know the result of the reference pattern, and vice versa. The collection of information must be prospective and independent. The next thing to ask is whether the predictor variables and the outcome in all the patients were measured. If the outcome or the variables are not measured in all patients, the validity of the CPR can be compromised. In any case, the authors should explain the exclusions, if there are any. Finally, are the methods of derivation and validation of the rule described? We already know that it is essential that the results of the rule be validated in a population different from the one used for the design.

If the answers to the previous questions indicate that the study is valid, we will answer the questions about the RELEVANCE of the results. The first is if you can calculate the performance of the CRP. The results should be presented with their sensitivity, specificity, odds ratios, ROC curves, etc., depending on the result provided by the rule (scoring scales, regression formulas, etc.). All these indicators will help us to calculate the probabilities of occurrence of the event in environments with different prevalence. This is similar to what we did with the studies of diagnostic tests, so I invite you to review the post on the subject to not repeat too much. The second question is: what is the precision of the results? Here we will not extend either: remember our revered confidence intervals, which will inform us of the accuracy of the results of the rule.

To finish, we will consider the APPLICABILITY of the results to our environment, for which we will try to answer three questions. Will the reproducibility of the PRC and its interpretation be satisfactory within the scope of the scenario? We will have to think about the similarities and differences between the field in which the CPR develops and our clinical environment. In this sense, it will be helpful if the rule has been validated in several samples of patients from different environments, which will increase its external validity. Is the test acceptable in this case? We will think wether the rule is easy to apply in our environment and wether it makes sense to do it from the clinical point of view in our environment. Finally, will the results modify clinical behavior, health outcomes or costs? If, from our point of view, the results of the CPR are not going to change anything, the rule will be useless and a waste of time. Here our opinion will be important, but we must also look for studies that assess the impact of the rule on costs or on health outcomes.

And up to here everything I wanted to tell you about critical appraising of studies on CPRs. Anyway, before finishing I would like to tell you a little about a checklist that, of course, also exists for the valuation of this type of studies: the checklist CHARMS (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modeling Studies). You will not tell me that the name, although a bit fancy, is not lovely.

This list is designed to assess the primary studies of a systematic review on CPRs. It try to answer some general design questions and assess 11 domains to extract enough information to perform the critical appraisal. The two great parts that are valued are the risk of bias in the studies and its applicability. The risk of bias refers to the design or validation flaws that may result in the model being less discriminative, excessively optimistic, etc. The applicability, on the other hand, refers to the degree to which the primary studies are in agreement with the question that motivates the systematic review, for which it informs us of whether the rule can be applied to the target population. This list is good and helps to assess and understand the methodological aspects of this type of studies but, in my humble opinion, it is easier to make a systematic critical appraisal by using the CASP’s tool.

And here, finally, we leave it for today. We have not spoken anything, so as not to stretch ourselves too long, of what to do with the result of the rule. The fundamental thing, we already know, is that we can calculate the probability of occurrence of the event in individual patients from environments with different prevalence. But that is another story…

Doc, is this serious?

Print Friendly, PDF & Email

I wonder how many times I have heard this question or one of its many variants. Because it turns out that we are always thinking about clinical trials and clinical questions about diagnosis and treatment, but think about whether a patient ever asked you if the treatment you were proposing was endorsed by a randomized controlled trial that meets the criteria of the CONSORT statement and has a good score on the Jadad scale. I can say, at least, that it has never happened to me. But they do ask me daily what will happen to them in the future.

And here lies the relevance of prognostic studies. Note that you cannot always heal and that, unfortunately, many times all we can do is assist and relieve, if it is possible, the announcement of serious sequelae or death. But it is essential to have good quality information about the future of our patient’s disease. This information will also serve to calibrate therapeutic efforts in each situation depending on the risks and benefits. And besides, prognostic’s studies are used to compare results between different departments or hospitals. Nobody comes up saying that a hospital is worse than another because their mortality is higher without first checking that the prognosis of patients is similar.

Before getting into the critical appraisal of prognostic studies, let’s clarify the difference between risk factor and prognostic factor. The risk factor is a characteristic of the environment or the subject that favors the development of the disease, while the prognostic factor is that which, once the disease occurs, influences its evolution. Risk factor and prognostic factor are different things, although sometimes they can coincide. What the two do share is the same type of study design. The ideal would be to use clinical trials, but most of the time we cannot or are not ethical to randomize the prognostic or risk factors. Let’s think we want to demonstrate the deleterious effect of booze on the liver. The way with the highest degree of evidence to prove it would be to make two random groups of participants and give 10 whiskeys a day to the participants of one arm and some water to the participants of the other, to see the differences in liver damage after a year, for example. However, it is evident to anyone that we cannot do a clinical trial like this. Not because we cannot find subjects for the intervention arm, but because ethics and common sense prevent us from doing it.

For this reason, it is usual to use cohort studies: we would study what differences at the level of the liver there may be between individuals who drink and who do not drink alcohol by their own choice. In cases that require very long follow-ups or in which the effect we want to measure is very rare, case-control studies can be used, but they will always be less powerful because they have a higher risk of bias. Following our ethyl example, we would study people with and without liver damage and we would see if one of the two groups was exposed to alcohol.

A prognostic study should inform us of three aspects: what result we evaluate, how likely they are to happen, and in what time frame we expect it to happen. And to appraise it, as always, we will base on our three pillars: validity, relevance and applicability.

To assess the VALIDITY, we´ll first consider if the article meets a set of primary or elimination criteria. If the answer is not, we better throw the paper and go to read the last bullshit our Facebook’s friends have written on our wall.

Is the study sample well defined and is it representative of patients at a similar stage of disease? The sample, which is usually called initial or incipient cohort, should be formed by a group of patients at the same stage of disease, ideally at the beginning, at it should be followed-up prospectively. It should be well specified the type of patients included, the criteria for diagnosing them and the method of selection. We must also verify that the follow-up has been long enough and complete enough to observe the event we study. Each participant has to be followed-up from the start to the end of the study, either because he’s healed, because he presents the event or because the study ends. It is very important to take into account losses during the study, very common in designs with long follow-up. The study should provide the characteristics of patients lost and the reasons for the loss. If they are similar to those who are not lost during follow-up, we can get valid results. If the number of patients lost to follow-up is greater than 20% it’s usually done a sensitivity analysis using the worst possible scenario, which considers that all losses have had a poor prognosis and then recalculate the results to check if they are modified, in which case the study results could be invalidated.

Once these two aspect being assessed, we turn to the secondary criteria about internal validity or scientific rigor.

Were outcomes measured objectively and unbiased? It must be clearly specified what is being measured and how before starting the study. In addition, in order to avoid the information bias, the ideal is that the measure of results is done blinded to the researcher, who must not know whether the subject in question is subjected to any of the prognostic factors.

Were the results adjusted by all relevant prognostic values? We must take into account all the confounding variables and prognostic factors that may influence the results. In case they are known from previous studies, known factors may be considered. Otherwise, the authors will determine these effects using stratified data analysis (the easiest method) or multivariate analysis (the more powerful and complex), usually by a proportional hazards model or Cox regression analysis. Although we’re not going to talk about regression models now, there are two simple aspects that we can take into account. First, these models need a certain number of events per variable included in the model, so distrust those where many variables are analyzed, especially with small samples. Second, the variables included are decided by the author and are different from one work to another, so we will have to assess if they have not included any that may be relevant to the final result.

Were the results validated in other groups of patients? When we set groups of variables and we make multiple comparisons we risk the chance plays a trick on us and shows us associations that don’t exists. This is why when a risk factor is described in a group (training or derivation group), the results should be replicated in an independent group (validation group) to be really sure about the effect.

Now we must consider what the results are to determine their RELEVANCE. For this, we’ll check if the probability of the outcome of the study is estimated and provided by the authors, as well as the accuracy of this estimate and the risk associated with the factors influencing the prognosis.

Is the probability of the event specified in a given period of time? There are several ways to present the number of events occurring during the follow-up period. The simplest would be to provide an incidence rate (events / person / unit time) or the cumulative frequency at any given time. Another indicator is the median survival, which is just the moment at follow-up in which the event has happened in half of the cohort participants (remember that although we speak about survival, the event not need tro be necessarily death).

We can use survival curves of various kinds to determine the probability of the occurrence of the event in each period and the rate at which it is presenting. Actuarial or life tables are used for larger samples when we don’t know the exact time of the event and we use fixed time periods. However, the more often used are the Kaplan-Meier curves, which better measure the probability of the event for each particular time with smaller samples. This method can provide hazard ratios and median survival, as well as other parameter accor4ding to the regression model used.

To assess the accuracy of the results will look, as always, for the confidence intervals. The larger the interval, the less accurate the estimate of the probability of occurrence in the general population, which is what we really want to know. Keep in mind that the number of patients is generally lower as time passes, so it is usual that the survival curves are more accurate at the beginning than at the end of follow up. Finally, we’ll assess the factors that modify the prognosis. The right thing is to represent all the variables that may influence the prognosis with its corresponding relative risks, which will allow us to evaluate the clinical significance of the association.

Finally, we must consider the APPLICABILITY of the results. Do they apply to my patients? We will look for similarities between the study patients and ours and assess whether the differences we find allow us to extrapolate the results to our practice. But besides, are the results useful? The fact that they’re applicable doesn’t necessarily mean that we have to implement them. We have to assess carefully if they’re going to help us to decide what treatment to apply and how to inform our patients and their families.

As always, I recommend you to use a template, such as those provided by CASP, for systematically critical appraisal without leaving any important matter without assessing.

You can see that articles about prognosis have a lot of to say. And we haven’t almost talked about regression models and survival curves, which are often the statistical core of this type of articles. But that’s another story…

You have to know what you are looking for

Print Friendly, PDF & Email

Every day we find articles that show new diagnostic tests that appear to have been designed to solve all our problems. But we should not be tempted to pay credit to everything we read before reconsidering what we have, in fact, read. At the end of the day, if we paid attention to everything we read we would be swollen from drinking Coca-Cola.

We know that a diagnostic test is not going to say whether or not a person is sick. Its result will only allow us to increase or decrease the probability that the individual is sick or not so we can confirm or rule out the diagnosis, but always with some degree of uncertainty.

Anyone has a certain risk of suffering from any disease, which is nothing more than the prevalence of the disease in the general population. Below a certain level of probability, it seems so unlikely that the patient is sick that we leave him alone and do not do any diagnostic tests (although some find it hard to restrain the urge to always ask for something). This is the diagnostic or test threshold.

But if, in addition to belonging to the population, one has the misfortune of having symptoms, that probability will increase until this threshold is exceeded, in which the probability of presenting the disease justifies performing diagnostic tests. Once we have the result of the test that we have chosen, the probability (post-test probability) will have changed. It may have changed to less and it has been placed below the test threshold, so we discard the diagnosis and leave the patient alone again. It may also exceed another threshold, the therapeutic, from which the probability of the disease reaches the sufficient level so as not to need further tests and to be able to initiate the treatment.

The usefulness of the diagnostic test will be in its ability to reduce the probability below the threshold of testing (and discard the diagnosis) or, on the contrary, to increase it to the threshold at which it is justified to start treatment. Of course, sometimes the test leaves us halfway and we have to do additional tests before confirming the diagnosis with enough security to start the treatment.

Diagnostic tests studies should provide information about the ability of a test to produce the same results when performed under similar conditions (reliability) and about the accuracy with which the measurements reflect that measure (validity). But they also give us data about their discriminatory power (sensitivity and specificity), their clinical performance (positive predictive value and negative predictive value), its ability to modify the probability of illness and change our position between the two thresholds (likelihood ratios), and about other aspects that allow us to assess whether it’s worth to test our patients with the diagnostic test. And to check if a study gives us the right information we need to make a critical appraisal and read the paper based on our three pillars: validity, relevance and applicability.

Let’s start with VALIDITY. First, we’ll make ourselves some basic eliminating questions about primary criteria about the study. If the answer to these questions is no, the best you can do probably is to use the article to wrap your mid-morning snack.

Was the diagnostic test blindly and independently compared with an appropriate gold standard or reference test?. We must review that results of reference test were not interpreted differently depending on the results of the study test, thus committing an incorporation bias, which could invalidate the results. Another problem that can arise is that the reference test results are frequently inconclusive. If we made the mistake of excluding that doubtful cases we’d commit and indeterminate exclusion bias that, in addition to overestimate the sensitivity and specificity of the test, will compromise the external validity of the study, whose conclusions would only be applicable to patients with indeterminate result.

Do patients encompass a similar spectrum to which we will find in our practice?. The inclusion criteria of the study should be clear, and the study must include healthy and diseased with varying severity or progression stages of disease. As we know, the prevalence influences the clinical performance of the test so if it’s validated, for example, in a tertiary center (the probability of being sick is statistically greater) its diagnostic capabilities will be overestimated when we use the test at a Primary Care center or with the general population (where the proportion of diseased will be lower).

At this point, if we think it’s worth reading further, we’ll focus on secondary criteria, which are those that add value to the study design. Another question to ask is: had the study test’s results any influence in the decision to do the reference test?. We have to check that there hasn’t been a sequence bias or a diagnostic verification bias, whereby excluding those with negative test. Although this is common in current practice (we start with simple tests and perform the more invasive ones only in positive patients), doing so in a diagnostic test study affect the validity of the results. Both tests should be done independently and blindly, so that the subjectivity of the observer does not influence the results (review bias). Finally, is the method described with enough detail to allow its reproduction?. It should be clear what is considered normal and abnormal and what criteria we have used to define normal and how we have interpreted the results of the test.

Having analyzed the internal validity of the study we’ll appraise the RELEVANCE of the presented data. The purpose of a diagnostic study is to determine the ability of a test to correctly classify individuals according to the presence or absence of disease. Actually, and to be more precise, we want to know how the likelihood of being ill increases after the test’s result (post-test probability). It’s therefore essential that the study gives information about the direction and magnitude of this change (pretest / posttest), that we know depends on the characteristics of the test and, to a large extent, on the prevalence or pretest probability.

Do the work present likelihood ratios or is it possible to calculate them from the data?. This information is critical because if not, we couldn’t estimate the clinical impact of the study test. We have to be especially careful with tests with quantitative results in which the researcher has established a cutoff of normality. When using ROC curves, it is usual to move the cutoff to favor sensitivity or specificity of the test, but we must always appraise how this measure affects the external validity of the study, since it may limit its applicability to a particular group of patients.

How reliable are the results?. We will have to determine whether the results are reproducible and how they can be affected by variations among different observers or when retested in succession. But we have not only to assess the reliability, but also how accurate the results are. The study was done on a sample of patients, but it should provide an estimate of their values in the population, so results should be expressed with their corresponding confident intervals.

The third pillar in critical appraising is that of APLICABILITY or external validity, which will help us to determine whether the results are useful to our patients. In this regard, we ask three questions. Is the test available and is it possible to perform it in our patients?. If the test is not available all we’ll have achieved with the study is to increase our vast knowledge. But if we can apply the test we must ask whether our patients fulfill the inclusion and exclusion criteria of the study and, if not, consider how these differences may affect the applicability of the test.

The second question is if we know the pretest probability of our patients. If our prevalence is very different from that of the study the actual usefulness of the test can be modified. One solution may be to do a sensitivity analysis evaluating how the study results would be modified after changing values of pre and posttest probability to a different ones that are clinically reasonable.

Finally, we should ask ourselves the most important question: can posttest probability change our therapeutic attitude, so being helpful to the patient?. For example, if the pretest probability is very low, probably the posttest probability will be also very low and won’t reach the therapeutic threshold, so it would be not worth spending money and effort with the test. Conversely, is pretest probability is very high it may be worth starting treatment without any more evidence, unless the treatment is very expensive or dangerous. As always, the virtue will be in the middle and it will be in these intermediate areas where more benefits can be obtained from the studied diagnostic test. In any case, we must never forget who our boss is (I mean the patient, not our boss at the office): you must not to be content only with studying the effectiveness or cost-effectiveness, but also consider the risks, discomfort, and patients preferences and the consequences that can lead to the performing of the diagnostic test.

If you allow me an advice, when critically appraising an article about diagnostic tests I recommend you to use the CASP’s templates, which can be downloaded from the website. They will help you make the critical appraising in a systematic and easy way.

A clarification to go running out: we must not confuse the studies of diagnostic tests with diagnostic prediction rules. Although the assessment is similar, the prediction rules have specific characteristics and methodological requirements that must be assessed in an appropriate way and that we will see in another post.

Finally, just say that everything we have said so far applies to the specific papers about diagnostic tests. However, the assessment of diagnostic tests may be part of observational studies such as cohort or case-control studies, which can have some peculiarity in the sequence of implementation and validation criteria of the study and reference test. But that’s another story…

The King under review

Print Friendly, PDF & Email

We all know that the randomized clinical trial is the king of interventional methodological designs. It is the type of epidemiological study that allows a better control of systematic errors or biases, since the researcher controls the variables of the study and the participants are randomly assigned among the interventions that are compared.

In this way, if two homogeneous groups that differ only in the intervention present some difference of interest during the follow-up, we can affirm with some confidence that this difference is due to the intervention, the only thing that the two groups do not have in common. For this reason, the clinical trial is the preferred design to answer clinical questions about intervention or treatment, although we will always have to be prudent with the evidence generated by a single clinical trial, no matter how well performed. When we perform a systematic review of randomized clinical trials on the same intervention and combine them in a meta-analysis, the answers we get will be more reliable than those obtained from a single study. That’s why some people say that the ideal design for answering treatment questions is not the clinical trial, but the meta-analysis of clinical trials.

In any case, as systematic reviews assess their primary studies individually and as it is more usual to find individual trials and not systematic reviews, it is advisable to know how to make a good critical appraisal in order to draw conclusions. In effect, we cannot relax when we see that an article corresponds to a clinical trial and take its content for granted. A clinical trial can also contain its traps and tricks, so, as with any other type of design, it will be a good practice to make a critical reading of it, based on our usual three pillars: validity, importance and applicability.

As always, when studying scientific rigor or VALIDITY (internal validity), we will first look at a series of essential primary criteria. If these are not met, it is better not to waste time with the trial and try to find another more profitable one.

Is there a clearly defined clinical question? In its origin, the trial must be designed to answer a structured clinical question about treatment, motivated by one of our multiple knowledge gaps. A working hypothesis should be proposed with its corresponding null and alternative hypothesis, if possible on a topic that is relevant from the clinical point of view. It is preferable that the study try to answer only one question. When you have several questions, the trial may get complicated in excess and end up not answering any of them completely and properly.

Was the assignment done randomly? As we have already said, to be able to affirm that the differences between the groups are due to the intervention, they must be homogeneous. This is achieved by assigning patients randomly, the only way to control the known confounding variables and, more importantly, also those that we do not know. If the groups were different and we attributed the difference only to the intervention, we could incur in a confusion bias. The trial should contain the usual and essential table 1 with the frequency of appearance of the demographic and confusion variables of both samples to be sure that the groups are homogeneous. A frequent error is to look for the differences between the two groups and evaluate them according to their p, when we know that p does not measure homogeneity. If we have distributed them at random, any difference we observe will necessarily be random (we will not need a p to know that). The sample size is not designed to discriminate between demographic variables, so a non-significant p may simply indicate that the sample is small to reach statistical significance. On the other hand, any minimal difference can reach statistical significance if the sample is large enough. So forget about the p: if there is any difference, what you have to do is assess whether it has sufficient clinical relevance to have influenced the results or, more elegantly, we will have to control the unbalanced covariates during the randomization. Fortunately, it is increasingly rare to find the tables of the study groups with the comparison of p between the intervention and control groups.

But it is not enough for the study to be randomized, we must also consider whether the randomization sequence was done correctly. The method used must ensure that all components of the selected population have the same probability of being chosen, so random number tables or computer generated sequences are preferred. The randomization must be hidden, so that it is not possible to know which group the next participant will belong to. That is why people like centralized systems by telephone or through the Internet. And here is something very curious: it turns out that it is well known that randomization produces samples of different sizes, especially if the samples are small, which is why samples randomized by blocks balanced in size are sometimes used. And I ask you, how many studies have you read with the same number of participants in the two branches and who claimed to be randomized? Do not trust if you see equal groups, especially if they are small, and do not be fooled: you can always use one of the multiple binomial probability calculators available on the Internet to know what is the probability that chance generates the groups that the authors present (we always speak of simple randomization, not by blocks, conglomerates, minimization or other techniques). You will be surprised with what you will find.

It is also important that the follow-up has been long and complete enough, so that the study lasts long enough to be able to observe the outcome variable and that every participant who enters the study is taken into account at the end. As a general rule, if the losses exceed 20%, it is admitted that the internal validity of the study may be compromised.

We will always have to analyze the nature of losses during follow-up, especially if they are high. We must try to determine if the losses are random or if they are related to any specific variable (which would be a bad matter) and estimate what effect they may have on the results of the trial. The most usual is usually to adopt the so-called worst-case scenarios: it is assumed that all the losses of the control group have gone well and all those in the intervention group have gone badly and the analysis is repeated to check if the conclusions are modified, in which case the validity of the study would be seriously compromised. The last important aspect is to consider whether patients who have not received the previously assigned treatment (there is always someone who does not know and mess up) have been analyzed according to the intention of treatment, since it is the only way to preserve all the benefits that are obtained with randomization. Everything that happens after the randomization (as a change of the assignment group) can influence the probability that the subject experiences the effect we are studying, so it is important to respect this analysis by intention to treat and analyze each one in the group in which it was initially assigned.

Once these primary criteria have been verified, we will look at three secondary criteria that influence internal validity. It will be necessary to verify that the groups were similar at the beginning of the study (we have already talked about the table with the data of the two groups), that the masking was carried out in an appropriate way as a form of control of biases and that the two groups were managed and controlled in a similar way except, of course, the intervention under study. We know that masking or blinding allows us to minimize the risk of information bias, which is why the researchers and participants are usually unaware of which group is assigned to each, which is known as double blind. Sometimes, given the nature of the intervention (think about a group that is operated on and another one that does not) it will be impossible to mask researchers and participants, but we can always give the masked data to the person who performs the analysis of the results (the so-called blind evaluator), which ameliorate this incovenient.

To summarize this section of validity of the trial, we can say that we will have to check that there is a clear definition of the study population, the intervention and the result of interest, that the randomization has been done properly, that they have been treated to control the information biases through masking, that there has been an adequate follow-up with control of the losses and that the analysis has been correct (analysis by intention of treat and control of covariates not balanced by randomization).

A very simple tool that can also help us assess the internal validity of a clinical trial is the Jadad’s scale, also called the Oxford’s quality scoring system. Jadad, a Colombian doctor, devised a scoring system with 7 questions. First, 5 questions whose affirmative answer adds 1 point:

  1. Is the study described as randomized?
  2. Is the method used to generate the randomization sequence described and is it adequate?
  3. Is the study described as double blind?
  4. Is the masking method described and is it adequate?
  5. Is there a description of the losses during follow up?

Finally, two questions whose negative answer subtracts 1 point:

  1. Is the method used to generate the randomization sequence adequate?
  2. Is the masking method appropriate?

As you can see, the Jadad’s scale assesses the key points that we have already mentioned: randomization, masking and monitoring. A trial is considered a rigorous study from the methodological point of view if it has a score of 5 points. If the study has 3 points or less, we better use it to wrap the sandwich.

We will now proceed to consider the results of the study to gauge its clinical RELEVANCE. It will be necessary to determine the variables measured to see if the trial adequately expresses the magnitude and precision of the results. It is important, once again, not to settle for being inundated with multiple p full of zeros. Remember that the p only indicates the probability that we are giving as good differences that only exist by chance (or, to put it simply, to make a type 1 error), but that statistical significance does not have to be synonymous with clinical relevance.

In the case of continuous variables such as survival time, weight, blood pressure, etc., it is usual to express the magnitude of the results as a difference in means or medians, depending on which measure of centralization is most appropriate. However, in cases of dichotomous variables (live or dead, healthy or sick, etc.) the relative risk, its relative and absolute reduction and the number needed to treat (NNT) will be used. Of all of them, the one that best expresses the clinical efficiency is always the NNT. Any trial worthy of our attention must provide this information or, failing that, the necessary information so that we can calculate it.

But to allow us to know a more realistic estimate of the results in the population, we need to know the precision of the study, and nothing is easier than resorting to confidence intervals. These intervals, in addition to precision, also inform us of statistical significance. It will be statistically significant if the risk ratio interval does not include the value one and that of the mean difference the value zero. In the case that the authors do not provide them, we can use a calculator to obtain them, such as those available on the CASP website.

A good way to sort the study of the clinical importance of a trial is to structure it in these four aspects: Quantitative assessment (measures of effect and its precision), Qualitative assessment (relevance from the clinical point of view), Comparative assessment (see if the results are consistent with those of other previous studies) and Cost-benefit assessment (this point would link to the next section of the critical appraisal that has to do with the applicability of the results of the trial).

To finish the critical reading of a treatment article we will value its APPLICABILITY (also called external validity), for which we will have to ask ourselves if the results can be generalized to our patients or, in other words, if there is any difference between our patients and those of the study that prevents the generalization of the results. It must be taken into account in this regard that the stricter the inclusion criteria of a study, the more difficult it will be to generalize its results, thereby compromising its external validity.

But, in addition, we must consider whether all clinically important outcomes have been taken into account, including side effects and undesirable effects. The measured result variable must be important for the investigator and for the patient. Do not forget that the fact that demonstrating that the intervention is effective does not necessarily mean that it is beneficial for our patients. We must also assess the harmful or annoying effects and study the benefits-costs-risks balance, as well as the difficulties that may exist to apply the treatment in our environment, the patient’s preferences, etc.

As it is easy to understand, a study can have a great methodological validity and its results have great importance from the clinical point of view and not be applicable to our patients, either because our patients are different from those of the study, because it does not adapt to your preferences or because it is unrealizable in our environment. However, the opposite usually does not happen: if the validity is poor or the results are unimportant, we will hardly consider applying the conclusions of the study to our patients.

To finish, recommend that you use some of the tools available for critical appraisal, such as the CASP templates, or a checklist, such as CONSORT, so as not to leave any of these points without consideration. Yes, all we have talked about is randomized and controlled clinical trials, and what happens if it is nonrandomized trials or other kinds of quasi-experimental studies? Well for that we follow another set of rules, such as those of the TREND statement. But that is another story…

The whole is greater than the sum of its parts

Print Friendly, PDF & Email

This is another of those famous quotes that are all over the place. Apparently, the first person to have this clever idea was Aristotle, who used it to summarize his holism general principle in his briefs on metaphysics. Who would have said that this tinny phrase contains so much wisdom?. Holism theory insists that everything must be considered in a comprehensive manner, because its components may act in a synergistic way, allowing the meaning of the whole to be greater than the meaning that each individual part contribute with.

Don’t be afraid, you are still on the blog about the brains and not on a blog about philosophy. Neither have I changed the topic of the blog, but this saying is just what I needed to introduce you to the wildest beast of scientific method, which is called meta-analysis.

We live in the information age. Since the end of the 20th century, we have witnessed a true explosion of the available sources of information, accessible from multiple platforms. The end result is that we are overwhelmed every time we need information about a specific point, so we do not know where to look or how we can find what we want. For this reason, systems began to be developed to synthesize the information available to make it more accessible when needed.

So, the first reviews come of the arid, the so-called narrative or author reviews. To write them, one or more authors, usually experts in a specific subject, made a general review on this topic, although without any strict criteria on the search strategy or selection of information. Following with total freedom, the authors analyzed the results as instructed by their will and ended up drawing their conclusions from a qualitative synthesis of the obtained results.

These narrative reviews are very useful for acquiring an overview of the topic, especially when one knows little about the subject, but they are not very useful for those who already know the topic and need answers to a more specific question. In addition, as the whole procedure is done according to authors´ wishes, the conclusions are not reproducible.

For these reasons, a series of privileged minds invented the other type of review in which we will focus on this post: the systematic review. Instead of reviewing a general topic, systematic reviews do focus on a specific topic in order to solve specific doubts of clinical practice. In addition, they use a clearly specified search strategy and inclusion criteria for an explicit and rigorous work, which makes them highly reproducible if another group of authors comes up with a repeat review of the same topic. And, if that were not enough, whenever possible, they go beyond the analysis of qualitative synthesis, completing it with a quantitative synthesis that receives the funny name of meta-analysis.

The realization of a systematic review consists of six steps: formulation of the problem or question to be answered, search and selection of existing studies, evaluation of the quality of these studies, extraction of the data, analysis of the results and, finally, interpretation and conclusion. We are going to detail this whole process a little.

Any systematic review worth its salt should try to answer a specific question that must be relevant from the clinical point of view. The question will usually be asked in a structured way with the usual components of population, intervention, comparison and outcome (PICO), so that the analysis of these components will allow us to know if the review is of our interest.

In addition, the components of the structured clinical question will help us to search for the relevant studies that exist on the subject. This search must be global and not biased, so we avoid possible biases of source excluding sources by language, journal, etc. The usual is to use a minimum of two important electronic databases of general use, such as Pubmed, Embase or the Cochrane’s, together with the specific ones of the subject that is being treated. It is important that this search is complemented by a manual search in non-electronic registers and by consulting the bibliographic references of the papers found, in addition to other sources of the so-called gray literature, such as doctoral theses, and documents of congresses, as well as documents from funding agencies, registers and, even, establishing contact with other researchers to know if there are studies not yet published.

It is very important that this strategy is clearly specified in the methods section of the review, so that anyone can reproduce it later, if desired. In addition, it will be necessary to clearly specify the inclusion and exclusion criteria of the primary studies of the review, the type of design sought and its main components (again in reference to the PICO, the components of the structured clinical question).

The third step is the evaluation of the quality of the studies found, which must be done by a minimum of two people independently, with the help of a third party (who will surely be the boss) to break the tie in cases where there is no consensus among the extractors. For this task, tools or checklists designed for this purpose are usually used; one of the most frequently used tool for bias control is the Cochrane Collaboration Tool. This tool assesses five criteria of the primary studies to determine their risk of bias: adequate randomization sequence (prevents selection bias), adequate masking (prevents biases of realization and detection, both information biases), concealment of allocation (prevents selection bias), losses to follow-up (prevents attrition bias) and selective data information (prevents information bias). The studies are classified as high, low or indeterminate risk of bias. It is common to use the colors of the traffic light, marking in green the studies with low risk of bias, in red those with high risk of bias and in yellow those who remain in no man’s land. The more green we see, the better the quality of the primary studies of the review will be.

Ad-hoc forms are usually designed for extraction of data, which usually collect data such as date, scope of the study, type of design, etc., as well as the components of the structured clinical question. As in the case of the previous step, it is convenient that this be done by more than one person, establishing the method to reach an agreement in cases where there is no consensus among the reviewers.

And here we enter the most interesting part of the review, the analysis of the results. The fundamental role of the authors will be to explain the differences that exist between the primary studies that are not due to chance, paying special attention to the variations in the design, study population, exposure or intervention and measured results. You can always make a qualitative synthesis analysis, although the real magic of the systematic review is that, when the characteristics of primary studies allow it, a quantitative synthesis, called meta-analysis, can also be performed.

A meta-analysis is a statistical analysis that combines the results of several independent studies that try to answer the same question. Although meta-analysis can be considered as a research project in its own right, it is usually part of a systematic review.

Primary studies can be combined using a statistical methodology developed for this purpose, which has a number of advantages. First, by combining all the results of the primary studies we can obtain a more complete global vision (you know, the whole is greater …). The second one, when studies are combined we increase the sample size, which increases the power of the study in comparison with that of the individual studies, improving the estimation of the effect we want to measure. Thirdly, when extracting the conclusions of a greater number of studies, its external validity increases, since having involved different populations it is easier to generalize the results. Finally, it can allow us to resolve controversies between the conclusions of the different primary studies of the review and, even, to answer questions that had not been raised in those studies.

Once the meta-analysis is done, a final synthesis must be made that integrates the results of the qualitative and quantitative synthesis in order to answer the question that motivated the systematic review or, when this is not possible, to propose the additional studies that must be carried out to be able to answer it.

But a meta-analysis will only deserve all our respect if it fulfills a series of requirements. As the systematic review to witch the meta-analysis belongs, it should aim to answer one specific question and it must be based on all relevant available information, avoiding publication bias and recovery bias. Also, primary studies must have been assessed to ensure its quality and its homogeneity before combining them. Of course, data must be analyzed and presented in an appropriate way. And, finally, it must make sense to combine the results in order to do it. The fact that we can combine results doesn’t always mean that we have to do it if it is not needed in our clinical setting.

And how do you combine the studies?, you could ask yourselves. Well, that’s the meta-analysis’ crux of the matter (crossings, really, there’re many), because there are several possible ways to do it.

Anyone could think that the easiest way would be a sort of Eurovision Contest. We account for the primary studies with a statistically significant positive effect and, if they are majority, we conclude that there’s consensus for positive result. This approach is quite simple but, you will not deny it, also quite sloppy. Also I can think about a number of disadvantages about its use. On one hand, it implies that lack of significance and lack of effect is synonymous, which does not always have to be true. On the other hand, it doesn’t take into account the direction and strength of effect in each study, nor the accuracy of estimators, neither the quality nor the characteristics of primary studies’ design. So, this type of approach is not very recommended, although nobody is going to fine us if we use it as an informal first approach before deciding which if the best way to combine the results.

Another possibility is to use a sort of sign test, similar to other non-parametric statistical techniques. We count the number of positive effects, we subtract the negatives and we have our conclusion. The truth is that this method also seems too simple. It ignores studies that don’t have statistical significance and also ignores the accuracy of studies’ estimators. So, this approach is not of much use, unless you only know the directions of the effects measured in the studies. We could also use it when primary studies are very heterogeneous to get an approximation of the global result, although I would not trust very much results obtained in this way.

The third method is to combine the different Ps of the studies (our beloved and sacrosanct Ps). This could come to our minds if we had a systematic review whose primary studies use different outcome measures, although all of them tried to answer the same question. For example, think about a study on osteoporosis where some studies use ultrasonic densitometry, others spine or femur DEXA, etc. The problem with this method is that it doesn’t take into account the intensities of effects, but only its directions and statistical significances, and we all know the deficiencies of our holy Ps. To be able to make this approach we’d need software that combines data that follow a Chi-square or Gaussian distribution, giving us an estimate and its confidence interval.

The fourth and final method that I know is also the most stylish: to make a weighted combination of the estimated effect in all the primary studies. To calculate the mean would be the easiest way, but we have not come this far to make fudge again. Arithmetic mean gives same emphasis to all studies, so if you have an outlier or imprecise study, results will be greatly distorted. Don’t forget that average always follow the tails of distributions and are heavily influenced by extreme values (which does not happen to her relative, the median).

This is why we have to weigh the different estimates. This can be done in two ways, taking into account the number of subjects in each study, or performing a weighting based on the inverses of the variances of each (you know, the squares of standard errors). The latter way is the more complex, so it is the one people preferred to do more often. Of course, as the maths needed are very hard, people usually use special software that can be external modules working in usual statistical programs such as Stata, SPSS, SAS or R, or specific software such as the famous Cochrane Collaboration’s RevMan.

As you can see, I have not been short of calling the systematic review with meta-analysis as the wildest beast of epidemiological designs. However, it has its detractors. We all know someone who claims not to like systematic reviews because almost all of them end up in the same way: “more quality studies are needed to be able to make recommendations with a reasonable degree of evidence”. Of course, in these cases we cannot put the blame on the review, because we do not take enough care to perform our studies so the vast majority deserves to end up in the paper shredder.

Another controversy is that of those who debate about what is better, a good systematic review or a good clinical trial (reviews can be made on other types of designs, including observational studies). This debate reminds me of the controversy over whether one should do a calimocho mixing a good wine or if it is a sin to mix a good wine with Coca-Cola. Controversies aside, if you have to take a calimocho, I assure you that you will enjoy it more if you use a good wine, and something similar happens to reviews with the quality of their primary studies.

The problem of systematic reviews is that, to be really useful, you have to be very rigorous in its realization. So that we do not forget anything, there are lists of recommendations and verification that allow us to order the entire procedure of creation and dissemination of scientific works without making methodological errors or omissions in the procedure.

It all started with a program of the Health Service of the United Kingdom that ended with the founding of an international initiative to promote the transparency and precision of biomedical research works: the EQUATOR network (Enhancing the QUAlity and Transparency of health Research). This network consists of experts in methodology, communication and publication, so it includes professionals involved in the quality of the entire process of production and dissemination of research results. Among many other objectives, which you can consult on its website, one is to design a set of recommendations for the realization and publication of the different types of studies, which gives rise to different checklists or statements.

The checklist designed to apply to systematic reviews is the PRISMA statement (Preferred Reporting Items for Systematic reviews and Meta-Analyses), which comes to replace the QUOROM statement (QUality Of Reporting Of Meta-analyses). Based on the definition of systematic review of the Cochrane Collaboration, PRISMA helps us to select, identify and assess the studies included in a review. It also consists of a checklist and a flowchart that describes the passage of all the studies considered during the realization of the review. There is also a lesser-known statement for the assessment of meta-analyses of observational studies, the MOOSE statement (Meta-analyses of Observational Studies in Epidemiology).

The Cochrane Collaboration also has a very well structured and defined methodology, which you can consult on its website. This is the reason why they have so much prestige within the world of systematic reviews, because they are made by professionals who are dedicated to the task following a rigorous and contrasted methodology. Anyway, even Cochrane’s reviews should be critically read and not giving them anything for insured.

And with this we have reached the end for today. I want to insist that meta-analysis should be done whenever possible and interesting, but making sure beforehand that it is correct to combine the results. If the studies are very heterogeneous we should not combine anything, since the results that we could obtain would have a much compromised validity. There is a whole series of methods and statistics to measure the homogeneity or heterogeneity of the primary studies, which also influence the way in which we analyze the combined data. But that is another story…

The hereafter

Print Friendly, PDF & Email

We have already seen in previous posts how to search for information in Pubmed in different ways, from the simplest, which is the simple search, to the advanced search methods and filtering  of results. Pubmed is, in my modest opinion, a very useful tool for professionals who have to look for biomedical information among the maelstrom of papers that are published daily.

However, Pubmed should not be our only search engine. Yes, ladies and gentlemen, not only does it turn out that there is life beyond Pubmed, but there is a lot of it and interesting.

The first engine I can think of because of the similarity to Pubmed is Embase. This is an Elsevier’s search engine that has about 32 million records of about 8500 journals from 95 countries. As with Pubmed, there are several search options that make it a versatile tool, something more specific for European studies and about drugs than Pubmed (or so they say). The usual when you want to do a thorough search is to use two databases, with the combination of Pubmed and Embase being frequent, since both search engines will provide us with records that the other search engine will not have indexed. The big drawback of Embase, especially when compared to Pubmed, is that its access is not free. Anyway, those who work in large health centers can have the luck to have a subscription paid through the library of the center.

Another useful tool is provided by the Cochrane Library, which includes multiple resources including the Cochrane Database of Systematic Reviews (CDSR), the Cochrane Central Register of Controlled Trials (CENTRAL), the Cochrane Methodology Register (CMR), the Database of Abstracts of Reviews (DARE), the Health Technology Assessment Database (HTA) and the NHS Economic Evaluation Database (EED). In addition, the Spanish-speakers can resort to the Cochrane Library Plus, which translates into Spanish the works of the Cochrane Library. Cochrane Plus is not free, but in Spain we enjoy a subscription that kindly pays us the Ministry of Health, Equality and Social Services.

And since we speak of resources in Spanish, let me bring the ember to my sardine and tell you two search engines that are very dear to me. The first is Epistemonikos, which is a source of systematic reviews and other types of scientific evidence. The second is Pediaclic, a search tool for child health information resources, which classifies the results into a series of categories such as systematic reviews, clinical practice guidelines, evidence-based summaries, and so on.

In fact, Epistemonikos and Pediaclic are meta-searchers. A meta-searcher is a tool that searches in a series of databases and not in a single indexed database like Pubmed or Embase.

There are many meta-search engines but, without a doubt, the king of all and one not to be missed is TRIP Database.

TRIP (Turning Research Into Practice) is a free-access meta-search engine that was created in 1997 to facilitate the search for information from evidence-based medicine databases, although it has evolved and nowadays also retrieves information from image banks , documents for patients, electronic textbooks and even Medline (Pubmed’s database). Let’s take a look at how it works.

In the first figure you can see the top of the TRIP home page. In the simplest form, we will select the link “Search” (it is the one that works by default when we open the page), we will write in the search window the English terms we want to search for and click on the magnifying glass on the right, with what the search engine will show us the list of results.

Although the latest version of TRIP includes a language selector, it is probably best to enter the terms in English in the search window, trying not to put more than two or three words to get the best results. Here we can use the same logical operators we saw in Pubmed (AND, OR and NOT), as well as the truncation operator “*”. In fact, if you type several words in a row, TRIP automatically includes the AND operator between them.

Next to “Search” you can see a link that says “PICO”. This opens a search menu in which we can select the four components of the structured clinical question separately: patients (P), intervention (I), comparison (C) and outcomes (O).

To the right there are two more links. “Advanced” allows advanced searches by fields of the record as the name of the journal, title, year, etc. “Recent” allows us to access the search history. The problem is that these two links are reserved in the latest versions for licensed users. In previous version of TRIP they were free, so I hope that this little flaw will not spread to the whole search engine and that, soon, TRIP will end up being a payment resource.

There are video tutorials available on the web of the search engine about the operation of the diverse modalities of TRIP; but the most attractive thing about TRIP is its way of ordering the results of the search, since it does so according to the source and the quality and the frequency of appearance of the search terms in the articles found. To the right of the screen you can see the list of results organized into a series of categories, such as systematic reviews, evidence-based medicine synopsis, clinical practice guidelines, clinical questions, Medline articles filtered through Clinical Queries, etc.

We can click on one of the categories and restrict the list of results. Once this is done, we can still restrict more the list based on subcategories. For example, if we select systematic reviews we can later restrict to only those of the Cochrane. The possibilities are many, so I invite you to try them.Let’s look at an example. If I write “asthma obesity children” in the search string, I get 1117 results and the list of resources sorted to the right, as you see in the second figure. If I now click on the index “sistematic review” and later on “Cochrane”, I’ll have a single result, although I’ll recover the rest just clicking any of the other categories. Have you ever seen such a combination of simplicity and power? In my humble opinion, with a decent management of Pubmed and the help of TRIP you can find everything you need, no matter how hidden.

And to finish today’s post, you’re going to allow me to ask you a favor: do not use Google to do medical searches or, at least, do not depend exclusively on Google, not even Google Scholar. This search engine is good for finding a restaurant or a hotel for holidays, but not for a controlled search for reliable and relevant medical information as we can do with other tools we have discussed. Of course, with the changes and evolutions that Google has accustomed us to, this may change over time and, maybe, in the future I will have to rewrite this post to recommend it (God forbid).

And here we will leave the topic of bibliographic searches. Needless to say, there are countless more search engines, which you can use the one you like the most or the one you have accessible on your computer or workplace. In some cases, as already mentioned, it is almost mandatory to use more than one, as in the case of systematic reviews, in which the two large ones (Pubmed and Embase) are often used and combined with Cochrane’s and some other that are specific for the subject matter. Because all the search engines we have seen are general, but there are specifics of nursing, psychology, physiotherapy, etc., as well as specific disease. For example, if you do a systematic review on a tropical disease it is advisable to use a specific subject database, such as LILACS, as well as local magazine searchers, if any. But that is another story…

Gathering the gold nuggets

Print Friendly, PDF & Email

I was thinking about today’s post and I cannot help remembering the gold-seekers of the Alaskan gold rush of the late nineteenth century. They went traveling to Yukon, looking for a good creek like the Bonanza and collecting tons of mud. But that mud was not the last step of the quest. Among the sediments they had to extract the longed gold nuggets, for which they carefully filtered the sediments to keep only the gold, when there was any.

When we look for the best scientific evidence to solve our clinical questions we do something similar. Normally we chose one of the Internet search engines (like Pubmed, our Bonanza Creek) and we usually get a long list of results (our great deal of mud) that, finally, we will have to filter to extract the gold nuggets, if there are any among the search results.

We have already seen in previous posts how to do a simple search (the least specific and which will provide us with more mud) and how to refine the searches by using the MeSH terms or the advanced search form, with which we try to get less mud and more nuggets.

However, the usual situation is that, once we have the list of results, we have to filter it to keep only what interests us most. Well, for that there is a very popular tool within Pubmed that is, oh surprise, the use of filters.

Let’s see an example. Suppose we want to seek information about the relationship between asthma and obesity in childhood. The ideal would be to build a structured clinical question to perform a specific search, but to show more clearly how filters work we will do a simple “bad designed” search with natural language, to obtain a greater number of results.

I open Pubmed’s home page, type asthma and obesity in children in the search box and press the “Search” button. I get 1169 results, although the number may vary if you do the search at another time.

You can see the result in the first figure. If you look closer, in the left margin of the screen there is a list of text with headings such as “Article types”, “text availability”, etc. Each section is one of the filters that I have selected to be shown in my results screen. You see that there are two links below. The first one says “Clear all” and serves to unmark all the filters that we have selected (in this case, still none). The second one says “Show additional filters” and, if we click on it, a screen with all the available filters appears so that we choose which we want them to be displayed on the screen. Take a look at all the possibilities.

When we want to apply a filter, we just have to click on the text under each filter header. In our case we will filter only the clinical trials published in the last five years and of which the full free text is available (without having to pay a subscription). To do this, click on “Clinical Trial”, “Free full text” and “5 years”, as you can see in the second figure. You can see that the list of results has been reduced to 11, a much more manageable figure than the original 1169.

Now we can remove filters one by one (by clicking on the word “clear” next to each filter), remove them all (by clicking “Clear all”) or add new ones (clicking on the filter we want).

Two precautions to take into account with the use of filters. First, filters will remain active until we deactivate them. If we do not realize it and deactivate them, we can apply them to searches that we do later and get fewer results than expected. Second, filters are built using the MeSH terms that have been assigned to each article at the time of indexing, so very recent articles, which has not been indexed yet and, therefore, have not get their MeSH terms allocated, will be lost when applying the filters. That is why it is advisable to apply the filters at the end of the search process, which is better to make more specific using other techniques such as the use of MeSH or advanced search.

Another option we have with indexes is to automate them for all the searches but without reducing the number of results. To do this we have to open an account in Pubmed by clicking on “Sign in to NCBI” in the upper right corner of the screen. Once we use the search engine as a registered user, we can click on a link above to the right that says “Manage filters” and select the filters we want. In the future, the searches that we do will be without filters, but above to the right you will see links to the filters that we have selected with the number of results in parentheses (you can see it in the first two figures that I have shown). By clicking, we will filter the list of results in a similar way as we did with the other filters, which are accessible without registering.

I would not like to leave the topic of Pubmed and its filters without talking about another search resource: Clinical Queries. You can access them by clicking on the “Pubmed Tools” on the home page of the search engine. Clinical Queries are a kind of filter built by Pubmed developers who filter the search so that only articles related to clinical research are shown.

We type the search string in the search box and we obtain the results distributed in three columns, as you see in the third figure attached. In the first column they are sorted according to the type of study (etiology, diagnosis, treatment, prognosis and clinical prediction guidelines) and the scope of the search that may be more specific (“Narrow”) or less (“Broad”). If we select “treatment” and narrow range (“Narrow”), we see that the search is limited to 25 articles.

The second column lists systematic reviews, meta-analyzes, reviews of evidence-based medicine, etc. Finally, the third focuses on papers on genetics.

If we want to see the complete list we can click on “See all” at the bottom of the list. We will then see a screen similar to the results of a simple or advanced search, as you see in the fourth attached figure. If you look at the search box, the search string has been slightly modified. Once we have this list we can modify the search string and press “Search” again, reapply the filters that suit us, etc. As you can see, the possibilities are endless.

And with this I think we’re going to say goodbye to Pubmed. I encourage you to investigate many other options and tools that are explained in the tutorials of the website, some of which will require you to have an account at NCBI (remember it’s free). You can, for example, set alarms so that the searcher warns you when something new related to certain search is published, among many other possibilities. But that’s another story…