Prevention not always is better

Any sensible people will tell you that prevention is better than cure. I’ve heard it a million times. There was even a television show named “It‘s better to prevent”. Besides, nobody in their right mind doubt about the health benefits that preventive medicine has achieved promoting the improving of lifestyles, controlling environmental conditions or with vaccination programs. But, however, when it comes to screening programs, I’ll say that it’s not always so clear that it’s better to prevent and, at times, it’s better to do nothing for two reasons. First, because our resources are limited and all that we spend on screening will come from other needs that will have fewer resources. Second, because even if we do it out of the best of intentions, if we try to prevent indiscriminately we can cause more harm than good.

So, we’ll have to think if there’s justification for any screening strategy before implementing it. The diagnostic test with which we plan to screen must be simple, inexpensive, reliable and with good acceptability by the population. It is important not to forget that we are going to do the test to healthy individuals that may not want to be bothered. Furthermore, it’s rare that we can confirm the diagnosis with a single positive result, and test to confirm it surely will be more expensive and cumbersome, if not clearly invasive (imagine that the screening must be confirmed by a biopsy). We will have to consider the sensitivity and specificity of the test because, although we tolerate some number of false positive when screening, if the confirmation test is expensive or very cumbersome, it will be better that false positive are few, or screening won’t be cost-effective.screening

Moreover, for the screening being worth of doing, the preventable disease has to have a long preclinical phase. If it is not so, we’ll have little opportunity to detected it. The problem is, of course, that we are more interested in detecting the more severe diseases, and those often have shorter preclinical stages without symptoms.

Besides, who is going to be screened?. Everyone, you will tell me. The problem is that this is the most expensive option, especially considering that healthy people do not usually go to the doctor and you’ll have to actively recruit them if you want to do the screening (for their sake, of course). To those who are sick, but not so much, you´ll tell me then. Well, not a great deal because when they go to the doctor they are already out of the reach of prevention (they’re already sick). But we can take advantage of those who go to the doctor for other reasons, some of you could think. This is called opportunity screening, and is what sometimes is done for practical reasons. It’s cheaper, but the theoretical benefits of universal screening are lost. Screening a number as large as possible is of particular interest when we’re trying to detect risk factors (such as hypertension) since, in addition of the advantages of early treatment, we have the opportunity to do primary prevention, much cheaper and with better health results.

So, as we see, doing a screening can have many advantages, what is evident to everyone. The problem is that we rarely think about the damage we can cause with that way of prevention. How is it possible that early disease detection or the possibility of doing an early treatment could harm someone?. Let’s make some considerations.

The test may be painful (a shot) or be bothering (to gather up stools in a container for three days). But if you think this is bullshit, think about people who have a heart attack while doing a stress test, or which have an anaphylactic shock, not to speak about the Japanese who suffer a perforation during a colonoscopy. That’s a horse of a different color. Moreover, the mere prospect of screening can cause anxiety or stress in a healthy person who should not be worried about it.

And think about what will happen if the test is positive. Imagine that, to confirm the diagnosis, we have to do a colonoscopy or a chorionic biopsy, not to mention the patient anxiety until diagnosis is confirmed or ruled out. And although we confirm the diagnosis, the benefit may be limited: what is the benefit for an asymptomatic person to know that he has a disease, especially when there is no treatment or it’s not already the time to start it?. But the fact is that, although there’s a treatment, it may be also injurious. One very up-to-date example of that is the effect of a prophylactic prostatectomy for a low-grade carcinoma detected with PSA screening: the patient can suffer incontinency or impotency (or both of them) for being operating on a surgery that could be delayed for years.

Think always that the potential benefits of screening in general healthy population may be limited because of this reason: people are healthy. If there is the slighted damage that may arise from the strategy of early screening and treatment we should seriously consider whether it is worth performing the screening program.

So, when do we have to do the screening for a given disease?. First, when disease burden is worth of doing screening. Disease burden depends on the severity and prevalence. If a disease is very common but very benign, disease burden will be low and probably we’ll not be interested in screening. In the event that it is very rare it should be neither worth of doing screening, unless the disease is severe and has a very effective treatment to prevent its complications. An example could be the screening of hyperphenylalaninemia in newborns.

Second, we need to have a proper test with the mentioned characteristics, especially the fact that number of false positives is not too high to avoid to have to confirm the diagnostic in too many healthy people, making a ruinous business.

Third, there has to be an early treatment that, also, has to be more effective than usual treatment at the onset of symptomatology. Of course, we must also have the resources to perform this treatment.

Fourth, both the screening test and the treatment arising from the positive result must be safe. Otherwise, we could do more damage than that we want to avoid.

And fifth, we must balance costs and potential benefits of screening. Don’t forget that, although the test is not very expensive, we are going to do it in a lot of people, so we’ll have to spend a huge amount of money, which is rather scarce at this moment.

Finally, just say that any screening program must be supplemented with studies proving its effectiveness. This can be done by direct or indirect methods depending on if we are comparing the possibilities of to do or not to do the screening, or if we study and compare the different screening strategies. But that’s another story…

Size does matter

Of course, we talk about samples.

For various reasons, scientific studies often use samples drawn from a population on which you want to get any specific conclusion. This sample will have to be selected so that it faithfully represents the population from which it is extracted but, how large sample ought to be: big or small?. Well, neither one thing nor the other: the sample must be of the appropriate size.

We’d need to rest a bit after reasoning until getting this conclusion but, first, we’ll try to find out what problems too large or too small samples can cause us.

Drawbacks of larger than needed samples are obvious: greater expenditure of time and resources.

Moreover, we already know that sometimes it is enough to increase sample size in order to obtain statistical significance, although we also know that if we use this technique in excess we can obtain significance among differences that are too small and, although the difference may be real, its clinical relevance can be limited. Doing that way we expend time and energy (and money) and can also create confusion about the importance of the difference found. So, as in many other aspects of life and medicine, when we speak about samples, more is not always better (nor it is better to have it larger).

What if the sample is small?. It’s a bit the opposite situation. The smaller the sample, the greater the imprecision (the wider the confidence intervals). As a result, differences have to be larger to reach statistical significance. We thus run the risk that, although there’s a real difference, we won’t be able to assure its existence due to the too small sample, losing the opportunity to shown differences that, although small, could be clinically relevant.

It’s clear, therefore, that the sample size must be the appropriate size and that, to avoid more evil, we should calculate it before doing the study.

Formulas to come up with the sample size depend on the statistical we are measuring and on the fact of whether we are estimating a population (a mean, for example) or we want to do a hypothesis testing between two variables or samples (to compare to samples, two proportions, etc.). In any case, most statistical programs are capable to calculate it quickly and without flinching. We just have to set the values of three parameters: the type 1 error, the study power and the minimum clinically relevant difference.

Type 1 error is the probability of rejecting the null hypothesis being true, so concluding that there’s a difference that is, indeed, not real. It is generally accepted that this probability, which is called alpha, must be less than 5% ant it is the level of statistical significance usually employed in hypothesis testing.

Type 2 error is the probability of concluding that there’s no difference when in fact there is (and not to reject the null hypothesis). This value is known as beta and a minimum of 80% is generally accepted as an appropriate level. Its complementary value (1-beta, or 100-beta if we like more percentages values) is what we call the power of the study.

Finally, the minimal clinically relevant difference is that that the study is able to detect, given that it actually exists. This value is set by the researcher according to the clinical context and has nothing to do with the statistical significance of the study.

Using these three parameters, we’ll calculate the required sample size to detect the difference considered relevant from the clinical point of view and with the desired amount of error.

Sometimes we can do this reasoning in the reverse way. If our sample size has an upper limit, whatever the reason, we can estimate the difference we’ll be able to detect before doing the study. If this difference is less than the clinically relevant difference, better save our work, because we’ll be at risk of not reach a conclusion because of the small and misleading sample, getting the conclusion that a real difference that not exists. Similarly, if we must stop the study before its programmed ending we should calculate if the achieved sample gives us enough power to discriminate the level of difference we proposed at the beginning of the study.

According to the variable we are measuring, sometimes we’ll need other data such as the population’s mean or standard deviation to estimate the sample size. If we don’t know these values we can do a pilot study with a few patients (at the judgment of the researcher) and calculate the sample size with the preliminary results.

One last thought before going to cool our heads. The sample size is calculated to estimate the primary outcome, but we cannot be sure our sample is appropriate to estimate any other parameters we measure during the study. Very often, we can see trials very well design to show the effectiveness of a treatment but incapable of provide conclusive data about its safety. But that’s another story…