Sample Size

When planning research it is important to consider how many experimental units you will need. Inluding too many is at best a waste of resources (and at its worst unethical) and including too few will produce no definitive answer. The sample size is a combination of practical considerations and mathematical calculation. We address the latter here:

There are any number of packages dedicated to performing size/power calculations. It is often possible to perform such calculations within standard statistical software (see, for example, the sampsi command in the Stata package). For other situations, other calculations may be required, and several stand alone packages are available for this purpose (details of some such software are contained here).

Comparison of proportions

For 1:1 randomisation to two groups, the total sample size is:

Stats

Example:

A standard treatment produces a response rate of 40%. It is hoped that a new treatment would increase the response rate to 50%. What sample size would be required for a study with 80% power to find such a difference significant in a 2-sided test at the 5% level?

The required sample size is:

Stats

Comparison of means

For 1:1 randomisation of two groups, the total sample size is:

Stats

Example:

How many patients would need to be recruited to a trial where the primary endpoint is to detect a difference of 5mm Hg in blood pressure between an intervention and a control? Assume the between-patient blood pressure standard deviation is 10mm Hg, and 90% power is required at a significant level of 5%.

Stats

Comparison of two survival curves

Unlike the previous scenarios, for time to event (or survival) data, the sample size is given in terms of the nukber of events that occur, e.g. deaths, cancer relapse.

Based on the logrank test, with patients randomised to receive two treatments in a ratio 1:1, the total number of events required is

Stats

where the z-values are as given above in 1.2, and HR is the hazard ratio. This can be estimated in several ways. The simplest assumption is that the survival times follow an exponential distribution, and from this the HR may be approximated by:

a) HR = Median Survival time (group 1) / Median Survival time (group 2)

b) HR = log (Survival proportion at a given time, group 1) / log (Survival proportion at the same time, group 2)

[note the reversal of groups in the numerator and denominator between (a) and (b)].

However, a more realistic estimate of the hazard ratio may well eminate from previous patient series which do not impose the assumption of exponential survival.

Example:

The five year survival of patients on standard therapy is approximately 50%. It is hoped that a new treatment will improve the absolute survival by 10%. How many deaths would be required in a trial to evaluate this treatment, with 80% power at the 5% significance level?

Survival proportions at five years are 50% and 60%; hence approximate HR = log (0.5) / (0.6) = 1.36 and then:

Total number of events = { (1.36 + 1) x (0.842 + 1.96) / (1.2 - 1)}^2 =343 (approx).