Science without sense…double nonsense

Píldoras sobre medicina basada en pruebas

Posts tagged Publication bias

Little ado about too much

Yes, I know that the saying goes just the opposite. But that is precisely the problem we have with so much new information technology. Today anyone can write and make public what goes through his head, reaching a lot of people, although what he says is bullshit (and no, I do not take this personally, not even my brother-in-law reads what I post!). The trouble is that much of what is written is not worth a bit, not to refer to any type of excreta. There is a lot of smoke and little fire, when we all would like the opposite to happen.

The same happens in medicine when we need information to make some of our clinical decisions. Anywhere the source we go, the volume of information will not only overwhelm us, but above all the majority of it will not serve us at all. Also, even if we find a well-done article it may not be enough to answer our question completely. That’s why we love so much the revisions of literature that some generous souls publish in medical journals. They save us the task of reviewing a lot of articles and summarizing the conclusions. Great, isn’t it? Well, sometimes it is, sometimes it is not. As when we read any type of medical literature’s study, we should always make a critical appraisal and not rely solely on the good know-how of its authors.

Revisions, of which we already know there are two types, also have their limitations, which we must know how to value. The simplest form of revision, our favorite when we are younger and ignorant, is what is known as a narrative review or author’s review. This type of review is usually done by an expert in the topic, who reviews the literature and analyzes what she finds as she believes that it is worth (for that she is an expert) and summarizes the qualitative synthesis with her expert’s conclusions. These types of reviews are good for getting a general idea about a topic, but they do not usually serve to answer specific questions. In addition, since it is not specified how the information search is done, we cannot reproduce it or verify that it includes everything important that has been written on the subject. With these revisions we can do little critical appraising, since there is no precise systematization of how these summaries have to be prepared, so we will have to trust unreliable aspects such as the prestige of the author or the impact of the journal where it is published.

As our knowledge of the general aspects of science increases, our interest is shifting towards other types of revisions that provide us with more specific information about aspects that escape our increasingly wide knowledge. This other type of review is the so-called systematic review (SR), which focuses on a specific question, follows a clearly specified methodology of searching and selection of information and performs a rigorous and critical analysis of the results found. Moreover, when the primary studies are sufficiently homogeneous, the SR goes beyond the qualitative synthesis, also performing a quantitative synthesis analysis, which has the nice name of meta-analysis. With these reviews we can do a critical appraising following an ordered and pre-established methodology, in a similar way as we do with other types of studies.

The prototype of SR is the one made by the Cochrane’s Collaboration, which has developed a specific methodology that you can consult in the manuals available on its website. But, if you want my advice, do not trust even the Cochrane’s and make a careful critical appraising even if the review has been done by them, not taking it for granted simply because of its origin. As one of my teachers in these disciplines says (I’m sure he’s smiling if he’s reading these lines), there is life after Cochrane’s. And, besides, there is lot of it, and good, I would add.

Although SRs and meta-analyzes impose a bit of respect at the beginning, do not worry, they can be critically evaluated in a simple way considering the main aspects of their methodology. And to do it, nothing better than to systematically review our three pillars: validity, relevance and applicability.

Regarding VALIDITY, we will try to determine whether or not the revision gives us some unbiased results and respond correctly to the question posed. As always, we will look for some primary validity criteria. If these are not fulfilled we will think if it is already time to walk the dog: we probably make better use of the time.

Has the aim of the review been clearly stated? All SRs should try to answer a specific question that is relevant from the clinical point of view, and that usually arises following the PICO scheme of a structured clinical question. It is preferable that the review try to answer only one question, since if it tries to respond to several ones there is a risk of not responding adequately to any of them. This question will also determine the type of studies that the review should include, so we must assess whether the appropriate type has been included. Although the most common is to find SRs of clinical trials, they can include other types of observational studies, diagnostic tests, etc. The authors of the review must specify the criteria for inclusion and exclusion of the studies, in addition to considering their aspects regarding the scope of realization, study groups, results, etc. Differences among the studies included in terms of (P) patients, (I) intervention or (O) outcomes make two SRs that ask the same question to reach to different conclusions.

If the answer to the two previous questions is affirmative, we will consider the secondary criteria and leave the dog’s walk for later. Have important studies that have to do with the subject been included? We must verify that a global and unbiased search of the literature has been carried out. It is frequent to do the electronic search including the most important databases (generally PubMed, Embase and the Cochrane’s Library), but this must be completed with a search strategy in other media to look for other works (references of the articles found, contact with well-known researchers, pharmaceutical industry, national and international registries, etc.), including the so-called gray literature (thesis, reports, etc.), since there may be important unpublished works. And that no one be surprised about the latter: it has been proven that the studies that obtain negative conclusions have more risk of not being published, so they do not appear in the SR. We must verify that the authors have ruled out the possibility of this publication bias. In general, this entire selection process is usually captured in a flow diagram that shows the evolution of all the studies assessed in the SR.

It is very important that enough has been done to assess the quality of the studies, looking for the existence of possible biases. For this, the authors can use an ad hoc designed tool or, more usually, resort to one that is already recognized and validated, such as the bias detection tool of the Cochrane’s Collaboration, in the case of reviews of clinical trials. This tool assesses five criteria of the primary studies to determine their risk of bias: adequate randomization sequence (prevents selection bias), adequate masking (prevents biases of realization and detection, both information biases), concealment of allocation (prevents selection bias), losses to follow-up (prevents attrition bias) and selective data information (prevents information bias). The studies are classified as high, low or indeterminate risk of bias according to the most important aspects of the design’s methodology (clinical trials in this case).

In addition, this must be done independently by two authors and, ideally, without knowing the authors of the study or the journals where the primary studies of the review were published. Finally, it should be recorded the degree of agreement between the two reviewers and what they did if they did not agree (the most common is to resort to a third party, which will probably be the boss of both).

To conclude with the internal or methodological validity, in case the results of the studies have been combined to draw common conclusions with a meta-analysis, we must ask ourselves if it was reasonable to combine the results of the primary studies. It is fundamental, in order to draw conclusions from combined data, that the studies are homogeneous and that the differences among them are due solely to chance. Although some variability of the studies increases the external validity of the conclusions, we cannot unify the data for the analysis if there are a lot of variability. There are numerous methods to assess the homogeneity about which we are not going to refer now, but we are going to insist on the need for the authors of the review to have studied it adequately.

In summary, the fundamental aspects that we will have to analyze to assess the validity of a SR will be: 1) that the aims of the review are well defined in terms of population, intervention and measurement of the result; 2) that the bibliographic search has been exhaustive; 3) that the criteria for inclusion and exclusion of primary studies in the review have been adequate; and 4) that the internal or methodological validity of the included studies has also been verified. In addition, if the SR includes a meta-analysis, we will review the methodological aspects that we saw in a previous post: the suitability of combining the studies to make a quantitative synthesis, the adequate evaluation of the heterogeneity of the primary studies and the use of a suitable mathematical model to combine the results of the primary studies (you know, that of the fixed effect and random effects models).

Regarding the RELEVANCE of the results we must consider what is the overall result of the review and if the interpretation has been made in a judicious manner. The SR should provide a global estimate of the effect of the intervention based on a weighted average of the included quality items. Most often, relative measures such as risk ratio or odds ratio are expressed, although ideally, they should be complemented with absolute measures such as absolute risk reduction or the number needed to treat (NNT). In addition, we must assess the accuracy of the results, for which we will use our beloved confidence intervals, which will give us an idea of ​​the accuracy of the estimation of the true magnitude of the effect in the population. As you can see, the way of assessing the importance of the results is practically the same as assessing the importance of the results of the primary studies. In this case we give examples of clinical trials, which is the type of study that we will see more frequently, but remember that there may be other types of studies that can better express the relevance of their results with other parameters. Of course, confidence intervals will always help us to assess the accuracy of the results.

The results of the meta-analyzes are usually represented in a standardized way, usually using the so-called forest plot. A graph is drawn with a vertical line of zero effect (in the one for relative risk and odds ratio and zero for means differences) and each study is represented as a mark (its result) in the middle of a segment (its confidence interval). Studies with results with statistical significance are those that do not cross the vertical line. Generally, the most powerful studies have narrower intervals and contribute more to the overall result, which is expressed as a diamond whose lateral ends represent its confidence interval. Only diamonds that do not cross the vertical line will have statistical significance. Also, the narrower the interval, the more accurate result. And, finally, the further away from the zero-effect line, the clearer the difference between the treatments or the comparative exposures will be.

If you want a more detailed explanation about the elements that make up a forest plot, you can go to the previous post where we explained it or to the online manuals of the Cochrane’s Collaboration.

We will conclude the critical appraising of the SR assessing the APPLICABILITY of the results to our environment. We will have to ask ourselves if we can apply the results to our patients and how they will influence the care we give them. We will have to see if the primary studies of the review describe the participants and if they resemble our patients. In addition, although we have already said that it is preferable that the SR is oriented to a specific question, it will be necessary to see if all the relevant results have been considered for the decision making in the problem under study, since sometimes it will be convenient to consider some other additional secondary variable. And, as always, we must assess the benefit-cost-risk ratio. The fact that the conclusion of the SR seems valid does not mean that we have to apply it in a compulsory way.

If you want to correctly evaluate a SR without forgetting any important aspect, I recommend you to use a checklist such as PRISMA’s or some of the tools available on the Internet, such as the grills that can be downloaded from the CASP page, which are the ones we have used for everything we have said so far.

The PRISMA statement (Preferred Reporting Items for Systematic reviews and Meta-Analyzes) consists of 27 items, classified in 7 sections that refer to the sections of title, summary, introduction, methods, results, discussion and financing:

  1. Title: it must be identified as SR, meta-analysis or both. If it is specified, in addition, that it deals with clinical trials, priority will be given to other types of reviews.
  2. Summary: it should be a structured summary that should include background, objectives, data sources, inclusion criteria, limitations, conclusions and implications. The registration number of the revision must also be included.
  3. Introduction: includes two items, the justification of the study (what is known, controversies, etc) and the objectives (what question tries to answer in PICO terms of the structured clinical question).
  4. Methods. It is the section with the largest number of items (12):

– Protocol and registration: indicate the registration number and its availability.

– Eligibility criteria: justification of the characteristics of the studies and the search criteria used.

– Sources of information: describe the sources used and the last search date.

– Search: complete electronic search strategy, so that it can be reproduced.

– Selection of studies: specify the selection process and inclusion’s and exclusion’s criteria.

– Data extraction process: describe the methods used to extract the data from the primary studies.

– Data list: define the variables used.

– Risk of bias in primary studies: describe the method used and how it has been used in the synthesis of results.

– Summary measures: specify the main summary measures used.

– Results synthesis: describe the methods used to combine the results.

– Risk of bias between studies: describe biases that may affect cumulative evidence, such as publication bias.

– Additional analyzes: if additional methods are made (sensitivity, metaregression, etc) specify which were pre-specified.

  1. Results. Includes 7 items:

– Selection of studies: it is expressed through a flow chart that assesses the number of records in each stage (identification, screening, eligibility and inclusion).

– Characteristics of the studies: present the characteristics of the studies from which data were extracted and their bibliographic references.

– Risk of bias in the studies: communicate the risks in each study and any evaluation that is made about the bias in the results.

– Results of the individual studies: study data for each study or intervention group and estimation of the effect with their confidence interval. The ideal is to accompany it with a forest plot.

– Synthesis of the results: present the results of all the meta-analysis performed with the confidence intervals and the consistency measures.

– Risk of bias between the subjects: present any evaluation that is made of the risk of bias between the studies.

– Additional analyzes: if they have been carried out, provide the results of the same.

  1. Discussion. Includes 3 items:

– Summary of the evidence: summarize the main findings with the strength of the evidence of each main result and the relevance from the clinical point of view or of the main interest groups (care providers, users, health decision-makers, etc.).

– Limitations: discuss the limitations of the results, the studies and the review.

– Conclusions: general interpretation of the results in context with other evidences and their implications for future research.

  1. Financing: describe the sources of funding and the role they played in the realization of the SR.

As a third option to these two tools, you can also use the aforementioned Cochrane’s Handbook for Systematic Reviews of Interventions, available on its website and whose purpose is to help authors of Cochrane’s reviews to work explicitly and systematically.

As you can see, we have not talked practically anything about meta-analysis, with all its statistical techniques to assess homogeneity and its fixed and random effects models. And is that the meta-analysis is a beast that must be eaten separately, so we have already devoted two post only about it that you can check when you want. But that is another story…

Achilles and Effects Forest

Achilles. What a man! Definitely, one of the main characters among those who were in that mess that was ensued in Troy because of Helena, a.k.a. the beauty. You know his story. In order to make him invulnerable his mother, who was none other than Tetis, the nymph, bathed him in ambrosia and submerged him in the River Stix. But she made a mistake that should not have been allowed to any nymph: she took him by his right heel, which did not get wet with the river’s water. And so, his heel became his only vulnerable part. Hector didn’t realize it in time but Paris, totally on the ball, put an arrow in Achilles’ heel and sent him back to the Stix, but not into the water, but rather to the other side. And without Charon the Ferryman.

This story is the origin of the expression “Achilles’ heel”, which usually refers to the weakest or most vulnerable point of someone or something that, otherwise, is usually known for its strength.

For example, something as robust and formidable as meta-analysis has its Achilles heel: the publication bias. And that’s because in the world of science there is no social justice.

All scientific works should have the same opportunities to be published and achieve fame, but the reality is not at all like that and they can be discriminated against for four reasons: statistical significance, popularity of the topic they are dealing with, having someone to sponsor them and the language in which they are written.

These are the main factors that can contribute to publication bias. First, studies with more significant results are more likely to be published and, within these, they are more likely to be published when the effect is greater. This means that studies with negative results or effects of small magnitude may not be published, which will draw a biased conclusion from the analysis only of large studies with positive results. In the same way, papers on topics of public interest are more likely to be published regardless of the importance of their results. In addition, the sponsor also influences: a company that finances a study with a product of theirs that has gone wrong, probably is not going to publish it so that we all know that their product is not useful.

Secondly, as is logical, published studies are more likely to reach our hands than those that are not published in scientific journals. This is the case of doctoral theses, communications to congresses, reports from government agencies or, even, pending studies to be published by researchers of the subject that we are dealing with. For this reason it is so important to do a search that includes this type of work, which is included within the grey literature term.

Finally, a series of biases can be listed that influence the likelihood that a work will be published or retrieved by the researcher performing the systematic review such as language bias (the search is limited by language), availability bias ( include only those studies that are easy for the researcher to recover), the cost bias (studies that are free or cheap), the familiarity bias (only those from the researcher’s discipline are included), the duplication bias (those that have significant results are more likely to be published more than once) and citation bias (studies with significant results are more likely to be cited by other authors).

One may think that this loss of studies during the review cannot be so serious, since it could be argued, for example, that studies not published in peer-reviewed journals are usually of poorer quality, so they do not deserve to be included in the meta-analysis However, it is not clear either that scientific journals ensure the methodological quality of the study or that this is the only method to do so. There are researchers, like those of government agencies, who are not interested in publishing in scientific journals, but in preparing reports for those who commission them. In addition, peer review is not a guarantee of quality since, too often, neither the researcher who carries out the study nor those in charge of reviewing it have a training in methodology that ensures the quality of the final product.

All this can be worsened by the fact that these same factors can influence the inclusion and exclusion criteria of the meta-analysis primary studies, in such a way that we obtain a sample of articles that may not be representative of the global knowledge on the subject of the systematic review and meta-analysis.

If we have a publication bias, the applicability of the results will be seriously compromised. That is why we say that the publication bias is the true Achilles’ heel of meta-analysis.

If we correctly delimit the inclusion and exclusion criteria of the studies and do a global and unrestricted search of the literature we will have done everything possible to minimize the risk of bias, but we can never be sure of having avoided it. That is why techniques and tools have been devised for its detection.

The most used has the sympathetic name of funnel plot. It shows the magnitude of the measured effect (X axis) versus a precision measurement (Y axis), which is usually the sample size, but which can also be the inverse of the variance or the standard error. We represent each primary study with a point and observe the point cloud.

In the most usual way, with the size of the sample on the Y axis, the precision of the results will be higher in the larger sample studies, so that the points will be closer together in the upper part of the axis and will be dispersed when approaching the origin of the axis Y. In this way, we observe a cloud of points in the form of a funnel, with the wide part down. This graphic should be symmetrical and, if that is not the case, we should always suspect a publication bias. In the second example attached you can see how there are “missing” studies on the side of lack of effect: this may mean that only studies with positive results are published.

This method is very simple to use but, sometimes, we can have doubts about the asymmetry of our funnel, especially if the number of studies is small. In addition, the funnel can be asymmetrical due to quality defects in the studies or because we are dealing with interventions whose effect varies according to the sample size of each study. For these cases, other more objective methods have been devised, such as the Begg’s rank correlation test and the Egger’s linear regression test.

The Begg’s test studies the presence of association between the estimates of the effects and their variances. If there is a correlation between them, bad going. The problem with this test is that it has little statistical power, so it is not reliable when the number of primary studies is small.

Egger’s test, more specific than Begg’s, consists of plotting the regression line between the precision of the studies (independent variable) and the standardized effect (dependent variable). This regression must be weighted by the inverse of the variance, so I do not recommend that you do it on your own, unless you are consummate statisticians. When there is no publication bias, the regression line originates at the zero of the Y axis. The further away from zero, the more evidence of publication bias.

As always, there are computer programs that do these tests quickly without having to burn your brain with the calculations.

What if after doing the work we see that there is publication bias? Can we do something to adjust it? As always, we can.

The simplest way is to use a graphic method called trim and fill. It consists of the following: a) we draw the funnel plot; b) we remove the small studies so that the funnel is symmetrical; c) the new center of the graph is determined; d) we recover the previously removed studies and we add their reflection to the other side of the center line; e) we estimate again the effect.Another very conservative attitude that we can adopt is to assume that there is a publication bias and to ask how much it affects our results, assuming that we have left studies not included in the analysis.

The only way to know if the publication bias affects our estimates would be to compare the effect in the retrieved and unrecovered studies but, of course, then we would not have to worry about the publication bias.

To know if the observed result is robust or, on the contrary, it is susceptible to be biased by a publication bias, two methods of the fail-safe N have been devised.

The first is the Rosenthal’s fail-safe N method. Suppose we have a meta-analysis with an effect that is statistically significant, for example, a risk ratio greater than one with a p <0.05 (or a 95% confidence interval that does not include the null value, one). Then we ask ourselves a question: how many studies with RR = 1 (null value) will we have to include until p is not significant? If we need few studies (less than 10) to make the value of the effect null, we can worry because the effect may in fact be null and our significance is the product of a publication bias. On the contrary, if many studies are needed, the effect is likely to be truly significant. This number of studies is what the letter N of the name of the method means.

The problem with this method is that it focuses on the statistical significance and not on the relevance of the results. The correct thing would be to look for how many studies are needed so that the result loses clinical relevance, not statistical significance. In addition, it assumes that the effects of the missing studies is null (one in case of risk ratios and odds ratios, zero in cases of differences in means), when the effect of the missing studies can go in the opposite direction than the effect that we detect or in the same sense but of smaller magnitude.

To avoid these disadvantages there is a variation of the previous formula that assesses the statistical significance and clinical relevance. With this method, which is called the Orwin’s fail-safe N, it is calculated how many studies are needed to bring the value of the effect to a specific value, which will generally be the least effect that is clinically relevant. This method also allows to specify the average effect of the missing studies.

To end the meta-analysis explanation, let’s see what is the right way to express the results of data analysis. To do it well, we can follow the recommendations of the PRISMA statement, which devotes seven of its 27 items to give us advice on how to present the results of a meta-analysis.

First, we must inform about the selection process of studies: how many we have found and evaluated, how many we have selected and how many rejected, explaining in addition the reasons for doing so. For this, the flowchart that should include the systematic review from which the meta-analysis proceeds if it complies with the PRISMA statement is very useful.

Secondly, the characteristics of the primary studies must be specified, detailing what data we get from each one of them and their corresponding bibliographic citations to facilitate that any reader of the review can verify the data if he does not trust us. In this sense, there is also the third section, which refers to the evaluation of the risk of study biases and their internal validity.

Fourth, we must present the results of each individual study with a summary data of each intervention group analyzed together with the calculated estimators and their confidence intervals. These data will help us to compile the information that PRISMA asks us in its fifth point referring to the presentation of results and it is none other than the synthesis of all the meta-analysis studies, their confidence intervals, homogeneity study results, etc.

This is usually done graphically by means of an effects diagram, a graphical tool popularly known as forest plot, where the trees would be the primary studies of the meta-analysis and where all the relevant results of the quantitative synthesis are summarized.

The Cochrane’s Collaboration recommends structuring the forest plot in five well differentiated columns. Column 1 lists the primary studies or the groups or subgroups of patients included in the meta-analysis. They are usually represented by an identifier composed of the name of the first author and the date of publication.Column 2 shows the results of the measures of effect of each study as reported by their respective authors.

Column 3 is the actual forest plot, the graphic part of the subject. It shows the measures of effect of each study on both sides of the zero effect line, which we already know is zero for mean differences and one for odds ratios, risk ratios, hazard ratios, etc. Each study is represented by a square whose area is usually proportional to the contribution of each one to the overall result. In addition, the square is within a segment that represents the extremes of its confidence interval.

These confidence intervals inform us about the accuracy of the studies and tell us which are statistically significant: those whose interval does not cross the zero effect line. Anyway, do not forget that, although crossing the line of no effect and being not statistically significant, the interval boundaries can give us much information about the clinical significance of the results of each study. Finally, at the bottom of the chart we will find a diamond that represents the global result of the meta-analysis. Its position with respect to the null effect line will inform us about the statistical significance of the overall result, while its width will give us an idea of ​​its accuracy (its confidence interval). Furthermore, on top of this column will find the type of effect measurement, the analysis model data is used (fixed or random) and the significance value of the confidence intervals (typically 95%).

This chart is usually completed by a fourth column with the estimated weight of each study in per cent format and a fifth column with the estimates of the weighted effect of each. And in some corner of this forest will be the measure of heterogeneity that has been used, along with its statistical significance in cases where relevant.

To conclude the presentation of the results, PRISMA recommends a sixth section with the evaluation that has been made of the risks of bias in the study and a seventh with all the additional analyzes that have been necessary: stratification, sensitivity analysis, metaregression, etc.

As you can see, nothing is easy about meta-analysis. Therefore, the Cochrane’s recommends following a series of steps to correctly interpret the results. Namely:

  1. Verify which variable is compared and how. It is usually seen at the top of the forest plot.
  2. Locate the measure of effect used. This is logical and necessary to know how to interpret the results. A hazard ratio is not the same as a difference in means or whatever it was used.
  3. Locate the diamond, its position and its amplitude. It is also convenient to look at the numerical value of the global estimator and its confidence interval.
  4. Check that heterogeneity has been studied. This can be seen by looking at whether the segments that represent the primary studies are or are not very dispersed and whether they overlap or not. In any case, there will always be a statistic that assesses the degree of heterogeneity. If we see that there is heterogeneity, the next thing will be to find out what explanation the authors give about its existence.
  5. Draw our conclusions. We will look at which side of the null effect line are the overall effect and its confidence interval. You already know that, although it is significant, the lower limit of the interval should be as far as possible from the line, because of the clinical relevance, which does not always coincide with statistical significance. Finally, look again at the study of homogeneity. If there is a lot of heterogeneity, the results will not be as reliable.

And with this we end the topic of meta-analysis. In fact, the forest plot is not exclusive to meta-analyzes and can be used whenever we want to compare studies to elucidate their statistical or clinical significance, or in cases such as equivalence studies, in which the null effect line is joined of the equivalence thresholds. But it still has one more utility. A variant of the forest plot also serves to assess if there is a publication bias in the systematic review, although, as we already know, in these cases we change its name to funnel graph. But that is another story…

A bias by absence

We can find strength through unity. It is a fact. Great goals are achieved more easily with the joining of the effort of many. And this is also true in statistics.
In fact, there are times when clinical trials do not have the power to demonstrate what they are pursuing, either because of lack of sample due to time, money or difficulty recruiting participants, or because of other methodological limitations. In these cases, it is possible to resort to a technique that allows us sometimes to combine the effort of multiple trials in order to reach the conclusion that we would not reach with any of the trials separately. This technique is meta-analysis.
Meta-analysis gives us an exact quantitative mathematical synthesis of the studies included in the analysis, generally the studies retrieved during a systematic review. Logically, if we include all the studies that have been done on a topic (or, at least, all that are relevant to our research), that synthesis will reflect the current knowledge on the subject. However, if the collection is biased and we lack studies, the result will reflect only the articles collected, not the total available knowledge.
When planning the review we must establish a global search structure to try to find all the articles. If we do not do this we can make a recovery bias, which will have the same effect on the quantitative analysis as the publication bias has. But even with modern electronic searches, it is very difficult to find all the relevant information on a particular topic.
In cases of missing studies, the importance of the effect will depend on how the studies are lost. If they are lost at random, everything will be in a problem of less information, so the accuracy of our results will be less and the confidence intervals will be broader, but our conclusions may be correct. However, if the articles that we do not find are systematically different from those we find, the result of our analysis may be biased, since our conclusions can only be applied to that sample of papers, which will be a biased sample.
There are a number of factors that may contribute to the publication bias. First, the studies with meaningful results are more likely to be published and, within these, they are more likely to be published when the effect is greater. This means that studies with negative results or with effects of small magnitude may not be published, so we will draw a biased conclusion from the analysis of only large studies with a positive result.
Secondly, of course, published studies are more likely to come into our hands than those that are not published in scientific journals. This is the case of doctoral theses, communications to congresses, reports from government agencies or even studies pending to be published by researchers of the subject we are dealing with. For this reason it is so important to do a search that includes this type of work, which fall within the term of gray literature.
Finally, a number of biases can be listed that influence the likelihood that a paper will be published or retrieved by the investigator performing the systematic review such as language bias (we limit the search by language), availability bias (to include only those studies that are easy to retrieve by the researcher), cost bias (to include studies that are free or cheap), familiarity bias (only those of the discipline of the investigator), duplication bias (those who have significant outcomes are more likely to be published more than once) and citation bias (studies with significant outcome are more likely to be cited by other authors).
One may think that losing studies during the review cannot be so serious, since it could be argued that unpublished studies in peer-reviewed journals are often of poorer quality, so they do not deserve to be included in the meta-analysis. However, it is not clear that the scientific journals ensure the methodological quality of the study or that this is the only method to do so. There are researchers, such as government agencies, who are not interested in publishing in scientific journals, but in producing reports for those who commission them. In addition, peer review is not a quality assurance because, too often, neither the researcher who performs the study nor those in charge of reviewing it have a methodology training that ensures the quality of the final product.
There are tools to assess the risk of publication bias. Perhaps the simplest may be to represent a forest plot ordered with the most accurate studies at the top and the less at the bottom. As we move down the precision of the results decreases, so that the effect must oscillate to both sides of the summary measure result. If it only oscillates towards one of the sides, we can indirectly assume that we have not detected the works that must exist that oscillate towards the opposite side, reason why surely we will have a bias of publication.
funnel_sesgoAnother similar procedure is the use of the funnel plot, as seen in the attached image. In this graph the effect size is plotted on the X axis and on the Y axis a measure of the variance or the sample size, inverted. Thus, at the top will be the largest and most accurate studies. Once again, as we go down the graph, the accuracy of the studies is smaller and they are shifted sideways by random error. When there is publication bias this displacement is asymmetrical. The problem of the funnel plot is that its interpretation can be subjective, so there are numerical methods to try to detect the existence of publication bias.
And, at this point, what should we do in the face of a publication bias? Perhaps the most appropriate thing is not to ask if there is bias, but how much it affects my results (and assume that we have left studies without being included in the analysis).
The only way to know if publication bias affects our estimates would be to compare the effect on recovered and unrecovered studies, but of course, then we would not have to worry about publication bias.
In order to know if the observed result is robust or, conversely, it is susceptible to be biased by a publication bias, two methods have been devised called as the fail-safe N methods.
The first method is the Rosenthal’s fail-safe N method. Suppose we have a meta-analysis with an effect that is statistically significant, for instance, a relative risk greater than one with a p <0.05 (or a 95% confidence interval that does not include the null value, one). Then we ask ourselves a question: how many studies with RR = 1 (null value) will have to be included until p is not significant? If we need few studies (less than 10) to invalidate the value of the effect, we may be concerned that the effect may actually be null and our significance is the result of a publication bias. Conversely, if many studies are needed, the effect is likely to be truly significant. This number of studies is what the letter N of the method name means. The problem with this method is that it focuses on statistical significance rather than on the relevance of results. The correct thing would be to look for how many studies are necessary so that the result loses clinical relevance, not statistical significance. In addition, it assumes that the effects of missing studies are zero (one in the case of relative risks and odds ratios, zero in cases of mean differences), when the effect of missing studies may go the other way than the effect we detected or In the same direction but of smaller magnitude. To avoid these drawbacks there is a variation of the previous formula which values statistical significance and clinical significance. With this method, which is called the Orwin´s fail-safe N, we calculate how many studies are needed to bring the value of the effect to a specific value, which will generally be the smallest effect that is clinically important. This method also allows specifying the average effect of missing studies.
And here we leave the meta-analysis and publication bias for today. We have not talked about any other mathematical methods to detect publication bias like Begg’s and Egger’s. There is even some graphic method apart from the ones we have mentioned, such as the trim and fill method. But that is another story…

Funnel asymmetry

Achilles. What a man!. Definitely, one of the main characters among those who were in that mess that was ensued in Troy because of Helena, a.k.a. the beauty. You know his story. In order to make him invulnerable his mother, who was none other than the nymph Tetis, bathed him in ambrosia and submerged him in the River Stix. But she made a mistake that should not have been allowed to any nymph: she took him by his right heel, which did not get wet with the river’s water. And so, his heel became his only vulnerable part. Hector didn’t realize it in time but Paris, far savvier, put an arrow in Achilles’ heel and sent him back to the Stix, but not into the water, but rather to the other side. Without Charon the Boatman.

This is the source of the expression Achilles’s heel, a metaphor for a vulnerable spot of someone or something that is otherwise usually known for their strength.

As an example, something as robust and formidable as a meta-analysis has its Achilles’s heel: publication bias. And that’s because in the world of science there is no social justice.

All scientific works should have the same opportunities to be published and become famous, but that is far from reality and those works can be discriminated against for four reasons: statistical significance, popularity of its topic, having someone who sponsors them and the language they are written.

The truth is that papers with statistically significant results are more likely to be published than those with non-significant ones. Moreover, even if accepted, the former are likely to be published before and, more often, in English written journals, which are more prestigious and have more diffusion. As a result, those papers are cited more frequently. And the same goes for papers with “positive” results versus those with “negative” results.

Similarly, papers about issues of public interest are more likely to be published regardless of the significance of their results. In addition, the sponsor also has its influence: a company that finances a study about one of its products is not going to be prone to publish the results if they are against the utility of the product concerned. And finally, English written papers have more diffusion than those written in other languages.

All of these can be worsened by the fact that these same factors may influence the choice of inclusion and exclusion criteria for primary studies in the meta-analysis, so we may get a sample of papers that may not be representative of global knowledge about the topic addressed in the systematic review and meta-analysis.

If there’s publication bias the applicability of results will be seriously compromised. This is why we say that publication bias is meta-analysis Achilles’s heel.

If we choose inclusion and exclusion criteria correctly and we do a global literature search without restrictions, we will have done our best to minimize the risk of bias, but we can never be sure of having prevented it. Therefore, there’re some techniques and tools that have been developed for publication bias detection.funnel_nosesgo

The most widely used is a tool known by its friendly name: funnel plot. It represents the magnitude of the effect measured (X axis) versus a precision measure (Y axis), which is usually the sample size, but may also be the inverse of the variance or the standard error. Each primary study is represented with a dot and we only have to observe the dot cloud shape.

In its most common form, with the sample size represented on Y axis, the precision of results is higher for larger sample studies, so dots are closer to each other at the top of the plot and are increasingly scattered toward the bottom of the plot, near the origin of Y axis. Thus, the cloud has a funnel shape, with the wide part downwards. The shape should be symmetrical and, if not, we must always suspect the existence of publication bias. In the second example that I show, you can see how there’re “missing” studies in the side of lack of effect: this may mean that published studies have been only those with positive results.

funnel_sesgo

This approach is very simple to use but, sometimes, we may have doubts about the funnel asymmetry, especially if the number of studies is small. In addition, the funnel may be asymmetric due to a deficient quality of studies or because we are dealing with interventions whose effect varies with the sample of each study. For these situations, other methods have been devised that are more objective, such as Begg’s rank correlation test and Egger’s linear regression.

Begg’s test examines the presence of association among the effect estimates and their variances. If there’s correlation among them, it is a bad thing. The problem with this test is that it is underpowered, so it’s unreliable when the number of primary studies is small.

eggerEgger’s test is more specific than Begg’s. This tool plots the regression line between precision of the studies (independent variable) and the standardized effect (dependent variable). This regression line must be weighed by the inverse of variance, so I do not recommend you to do it on your own unless you are a consummated statistic. When there isn’t publication bias the regression line originates in the Y-axis zero. So much further away from zero, further evidence of publication bias.

As always, there are computer programs available to make these tests quickly without us having to get our brains fried with their calculations.

And what if after doing all the work we find that there is publication bias? Can we do anything to adjust it?. As always, we can.trim_and_fill

The easiest way is to use a graphical technique that is called trim and fill adjustment. It works as follows: a) draw the funnel plot, b) remove small studies that make the funnel asymmetric, c) recalculate the new center of the graph, d) put back removed studies and add their reflections at the other side of the middle line of the cloud, e) re-estimate the effect.

And finally, only say that there’s a second method that is much more accurate but also much more complex, which consists of a regression model based on the Egger’s test. But that’s another story…