# Comparisson of variables.

We describe the statistical tests that should be used for comparisson of variables of different kinds.

Have you ever wondered why some people go bald, especially men at a certain age?. I think it has something to do with hormones. Anyway, it’s something that the affected usually like the least, even though the popular believe that bald are smarter. It seems to me that there is nothing wrong with being bald (it’s much worse to be an asshole) but, of course, I have all my hair on my head.

Following the thread of baldness, let’s suppose we want to know if hair color has anything to do with going bald sooner or later. We set up a non-sense trial with 50 brown-hair and 50 blond-hair participants to study how many go bald and when they do it.

This example serves us to illustrate the different types of variables that we can found in a clinical trial and the different methods that we use to compare each of them.

## Types of variables

Some variables are of quantitative continuous type. For instance, the weight of participants, their height, their income, the number of hair per square inch, etc.. Others are qualitative, such as hair color. In this case, we simplify it to a binary variable: brown or blond. Finally, there is a time-to-event type, which show the time it takes participants to present the event in study, in our case, baldness.

However, when comparing differences among these variables between the two groups of the study we have to pick out a method that will be determined by the type of variable that is being considered.

## Comparisson of variables

If we deal with a continuous variable such us age or weight between bald and hairy people, or between brown and blond, we’ll use the Student’s t test, provided that our data fit a normal distribution. If that is not the case, the non-parametric test that we would use is the** Mann-Whitney’s**.

And what if we want to compare several continuous variables at once?. Then we’ll use multiple lineal regression to make comparison among variables.

For qualitative variables the approach is different. To find out if there is a statistically significant dependence between two qualitative variables we have to build a contingency table and use the chi-squared or Fisher’s exact test, depending on our data. When in doubt, we can always use the Fisher’s test. Although it involves a more complex calculation, this is no problem for any of the statistical packages available today.

Another possibility is to calculate a measure of association, such us the risk ratio or odds ratio, with its corresponding confidence interval. If the interval do not intersect the line of no-effect (the one), we can consider the association as statistically significant.

But it may happen that we want to compare several qualitative variables at once. In these cases, we’ll use a **logistic regression** model.

Finally, we’ll discuss the time-to-event variables, a little more complicated to compare. If we deal with a variable such as the time it takes to go bald we have to build a **survival **or **Kaplan-Meier’s curve**, which graphically shows what percentage of subjects remain at any moment without presenting the event (or the percentage that has presented it, according to the way we read it).

But it could be that we want to compare the survival curves of brown and blond people to see if there are any differences in the rate at which the groups present the event of going bald. In this case we have to use the **log rank test**.

This method is based on the comparison between the two curves based on the differences between the observed survival and the expected survival values that we could get if there were no differences between the two groups. Remember that survival refers to the moment to present the event, not necessarily death. With this technique we get a p-value that indicates whether the difference between the two survival curves is statistically significant, but tells us nothing about the magnitude of that difference.

The case of more complex calculation is when we want to compare several variables with a time-to event-variable. For this multivariate analysis we have to use a **proportional hazards regression model (Cox’s regression)**. This model is more complex than the previous ones but, once again, any statistical software will carry it without difficulty if we feed it with the appropriate data.

## We’re leaving…

And we are going to leave the bald alone for once. We could talk more about time-to-event variables. The Kaplan-Meier’s curve gives us an idea of who is presenting the event over time, but it tells us nothing about the risk of presenting it at any given time. For that we need another indicator named hazard ratio. But that’s another story…