Systematic review and meta-analysis
This is another of those famous quotes that are all over the place. Apparently, the first person to have this clever idea was Aristotle, who used it to summarize his holism general principle in his briefs on metaphysics. Who would have said that this tinny phrase contains so much wisdom?. Holism theory insists that everything must be considered in a comprehensive manner, because its components may act in a synergistic way, allowing the meaning of the whole to be greater than the meaning that each individual part contribute with.
Don’t be afraid, you are still on the blog about the brains and not on a blog about philosophy. Neither have I changed the topic of the blog, but this saying is just what I needed to introduce you to the wildest beast of scientific method, which is called meta-analysis.
Introducing the topic
We live in the information age. Since the end of the 20th century, we have witnessed a true explosion of the available sources of information, accessible from multiple platforms. The end result is that we are overwhelmed every time we need information about a specific point, so we do not know where to look or how we can find what we want. For this reason, systems began to be developed to synthesize the information available to make it more accessible when needed.
So, the first reviews come of the arid, the so-called narrative or author reviews. To write them, one or more authors, usually experts in a specific subject, made a general review on this topic, although without any strict criteria on the search strategy or selection of information. Following with total freedom, the authors analyzed the results as instructed by their will and ended up drawing their conclusions from a qualitative synthesis of the obtained results.
These narrative reviews are very useful for acquiring an overview of the topic, especially when one knows little about the subject, but they are not very useful for those who already know the topic and need answers to a more specific question. In addition, as the whole procedure is done according to authors´ wishes, the conclusions are not reproducible.
For these reasons, a series of privileged minds invented the other type of review in which we will focus on this post: the systematic review. Instead of reviewing a general topic, systematic reviews do focus on a specific topic in order to solve specific doubts of clinical practice. In addition, they use a clearly specified search strategy and inclusion criteria for an explicit and rigorous work, which makes them highly reproducible if another group of authors comes up with a repeat review of the same topic. And, if that were not enough, whenever possible, they go beyond the analysis of qualitative synthesis, completing it with a quantitative synthesis that receives the funny name of meta-analysis.
The realization of a systematic review consists of six steps: formulation of the problem or question to be answered, search and selection of existing studies, evaluation of the quality of these studies, extraction of the data, analysis of the results and, finally, interpretation and conclusion. We are going to detail this whole process a little.
Any systematic review worth its salt should try to answer a specific question that must be relevant from the clinical point of view. The question will usually be asked in a structured way with the usual components of population, intervention, comparison and outcome (PICO), so that the analysis of these components will allow us to know if the review is of our interest.
In addition, the components of the structured clinical question will help us to search for the relevant studies that exist on the subject. This search must be global and not biased, so we avoid possible biases of source excluding sources by language, journal, etc. The usual is to use a minimum of two important electronic databases of general use, such as Pubmed, Embase or the Cochrane’s, together with the specific ones of the subject that is being treated. It is important that this search is complemented by a manual search in non-electronic registers and by consulting the bibliographic references of the papers found, in addition to other sources of the so-called gray literature, such as doctoral theses, and documents of congresses, as well as documents from funding agencies, registers and, even, establishing contact with other researchers to know if there are studies not yet published.
It is very important that this strategy is clearly specified in the methods section of the review, so that anyone can reproduce it later, if desired. In addition, it will be necessary to clearly specify the inclusion and exclusion criteria of the primary studies of the review, the type of design sought and its main components (again in reference to the PICO, the components of the structured clinical question).
It is important to assess the quality of the studies included in the review
The third step is the evaluation of the quality of the studies found, which must be done by a minimum of two people independently, with the help of a third party (who will surely be the boss) to break the tie in cases where there is no consensus among the extractors. For this task, tools or checklists designed for this purpose are usually used; one of the most frequently used tool for bias control is the Cochrane Collaboration Tool. This tool assesses five criteria of the primary studies to determine their risk of bias: adequate randomization sequence (prevents selection bias), adequate masking (prevents biases of realization and detection, both information biases), concealment of allocation (prevents selection bias), losses to follow-up (prevents attrition bias) and selective data information (prevents information bias). The studies are classified as high, low or indeterminate risk of bias. It is common to use the colors of the traffic light, marking in green the studies with low risk of bias, in red those with high risk of bias and in yellow those who remain in no man’s land. The more green we see, the better the quality of the primary studies of the review will be.
Ad-hoc forms are usually designed for extraction of data, which usually collect data such as date, scope of the study, type of design, etc., as well as the components of the structured clinical question. As in the case of the previous step, it is convenient that this be done by more than one person, establishing the method to reach an agreement in cases where there is no consensus among the reviewers.
And we come to meta-analysis
And here we enter the most interesting part of the review, the analysis of the results. The fundamental role of the authors will be to explain the differences that exist between the primary studies that are not due to chance, paying special attention to the variations in the design, study population, exposure or intervention and measured results. You can always make a qualitative synthesis analysis, although the real magic of the systematic review is that, when the characteristics of primary studies allow it, a quantitative synthesis, called meta-analysis, can also be performed.
A meta-analysis is a statistical analysis that combines the results of several independent studies that try to answer the same question. Although meta-analysis can be considered as a research project in its own right, it is usually part of a systematic review.
Primary studies can be combined using a statistical methodology developed for this purpose, which has a number of advantages. First, by combining all the results of the primary studies we can obtain a more complete global vision (you know, the whole is greater …). The second one, when studies are combined we increase the sample size, which increases the power of the study in comparison with that of the individual studies, improving the estimation of the effect we want to measure. Thirdly, when extracting the conclusions of a greater number of studies, its external validity increases, since having involved different populations it is easier to generalize the results. Finally, it can allow us to resolve controversies between the conclusions of the different primary studies of the review and, even, to answer questions that had not been raised in those studies.
Once the meta-analysis is done, a final synthesis must be made that integrates the results of the qualitative and quantitative synthesis in order to answer the question that motivated the systematic review or, when this is not possible, to propose the additional studies that must be carried out to be able to answer it.
But a meta-analysis will only deserve all our respect if it fulfills a series of requirements. As the systematic review to witch the meta-analysis belongs, it should aim to answer one specific question and it must be based on all relevant available information, avoiding publication bias and recovery bias. Also, primary studies must have been assessed to ensure its quality and its homogeneity before combining them. Of course, data must be analyzed and presented in an appropriate way. And, finally, it must make sense to combine the results in order to do it. The fact that we can combine results doesn’t always mean that we have to do it if it is not needed in our clinical setting.
Methods for combining studies
And how do you combine the studies?, you could ask yourselves. Well, that’s the meta-analysis’ crux of the matter (crossings, really, there’re many), because there are several possible ways to do it.
Anyone could think that the easiest way would be a sort of Eurovision Contest. We account for the primary studies with a statistically significant positive effect and, if they are majority, we conclude that there’s consensus for positive result. This approach is quite simple but, you will not deny it, also quite sloppy. Also I can think about a number of disadvantages about its use. On one hand, it implies that lack of significance and lack of effect is synonymous, which does not always have to be true. On the other hand, it doesn’t take into account the direction and strength of effect in each study, nor the accuracy of estimators, neither the quality nor the characteristics of primary studies’ design. So, this type of approach is not very recommended, although nobody is going to fine us if we use it as an informal first approach before deciding which if the best way to combine the results.
Another possibility is to use a sort of sign test, similar to other non-parametric statistical techniques. We count the number of positive effects, we subtract the negatives and we have our conclusion. The truth is that this method also seems too simple. It ignores studies that don’t have statistical significance and also ignores the accuracy of studies’ estimators. So, this approach is not of much use, unless you only know the directions of the effects measured in the studies. We could also use it when primary studies are very heterogeneous to get an approximation of the global result, although I would not trust very much results obtained in this way.
The third method is to combine the different Ps of the studies (our beloved and sacrosanct Ps). This could come to our minds if we had a systematic review whose primary studies use different outcome measures, although all of them tried to answer the same question. For example, think about a study on osteoporosis where some studies use ultrasonic densitometry, others spine or femur DEXA, etc. The problem with this method is that it doesn’t take into account the intensities of effects, but only its directions and statistical significances, and we all know the deficiencies of our holy Ps. To be able to make this approach we’d need software that combines data that follow a Chi-square or Gaussian distribution, giving us an estimate and its confidence interval.
The fourth and final method that I know is also the most stylish: to make a weighted combination of the estimated effect in all the primary studies. To calculate the mean would be the easiest way, but we have not come this far to make fudge again. Arithmetic mean gives same emphasis to all studies, so if you have an outlier or imprecise study, results will be greatly distorted. Don’t forget that average always follow the tails of distributions and are heavily influenced by extreme values (which does not happen to her relative, the median).
This is why we have to weigh the different estimates. This can be done in two ways, taking into account the number of subjects in each study, or performing a weighting based on the inverses of the variances of each (you know, the squares of standard errors). The latter way is the more complex, so it is the one people preferred to do more often. Of course, as the maths needed are very hard, people usually use special software that can be external modules working in usual statistical programs such as Stata, SPSS, SAS or R, or specific software such as the famous Cochrane Collaboration’s RevMan.
As you can see, I have not been short of calling the systematic review with meta-analysis as the wildest beast of epidemiological designs. However, it has its detractors. We all know someone who claims not to like systematic reviews because almost all of them end up in the same way: “more quality studies are needed to be able to make recommendations with a reasonable degree of evidence”. Of course, in these cases we cannot put the blame on the review, because we do not take enough care to perform our studies so the vast majority deserves to end up in the paper shredder.
Another controversy is that of those who debate about what is better, a good systematic review or a good clinical trial (reviews can be made on other types of designs, including observational studies). This debate reminds me of the controversy over whether one should do a calimocho mixing a good wine or if it is a sin to mix a good wine with Coca-Cola. Controversies aside, if you have to take a calimocho, I assure you that you will enjoy it more if you use a good wine, and something similar happens to reviews with the quality of their primary studies.
The problem of systematic reviews is that, to be really useful, you have to be very rigorous in its realization. So that we do not forget anything, there are lists of recommendations and verification that allow us to order the entire procedure of creation and dissemination of scientific works without making methodological errors or omissions in the procedure.
It all started with a program of the Health Service of the United Kingdom that ended with the founding of an international initiative to promote the transparency and precision of biomedical research works: the EQUATOR network (Enhancing the QUAlity and Transparency of health Research). This network consists of experts in methodology, communication and publication, so it includes professionals involved in the quality of the entire process of production and dissemination of research results. Among many other objectives, which you can consult on its website, one is to design a set of recommendations for the realization and publication of the different types of studies, which gives rise to different checklists or statements.
The checklist designed to apply to systematic reviews is the PRISMA statement (Preferred Reporting Items for Systematic reviews and Meta-Analyses), which comes to replace the QUOROM statement (QUality Of Reporting Of Meta-analyses). Based on the definition of systematic review of the Cochrane Collaboration, PRISMA helps us to select, identify and assess the studies included in a review. It also consists of a checklist and a flowchart that describes the passage of all the studies considered during the realization of the review. There is also a lesser-known statement for the assessment of meta-analyses of observational studies, the MOOSE statement (Meta-analyses of Observational Studies in Epidemiology).
The Cochrane Collaboration also has a very well structured and defined methodology, which you can consult on its website. This is the reason why they have so much prestige within the world of systematic reviews, because they are made by professionals who are dedicated to the task following a rigorous and contrasted methodology. Anyway, even Cochrane’s reviews should be critically read and not giving them anything for insured.
And with this we have reached the end for today. I want to insist that meta-analysis should be done whenever possible and interesting, but making sure beforehand that it is correct to combine the results. If the studies are very heterogeneous we should not combine anything, since the results that we could obtain would have a much compromised validity. There is a whole series of methods and statistics to measure the homogeneity or heterogeneity of the primary studies, which also influence the way in which we analyze the combined data. But that is another story…