BIOSTATISTICS
Statistics is the science of collecting, analysing, presenting and interpreting data. Many disciplines make use of
statistics, such as in doing medical research:
1. Ask question
o Extended antibiotic treatment (9x24h) is better than short antibiotic treatment (3x24h) for
the treatment of haematological patient with of iatrogenic neutropenia and fever of
unknown origin.
2. Formulate hypothesis
o Percentage of patients with fever-recurrence within 28 days does not differ between short
and extended antibiotic treatment.
3. Collect data
o Randomize 200 patients to receive either short or extended antibiotic treatment and count
the number of patients in each group with fever-recurrence within 28 days.
4. Analyse data
o 12 patients receiving short and 9 patients receiving extended antibiotic treatment had fever-
recurrence within 28 days; the 95% confidence interval for the difference of -3% is equal to (-
11.5%; 5.5%).
5. Formulate answer
o No statistical evidence for benefit of extended antibiotic treatment.
DESCRIPTIVE STATISTICS
There are several common types of study:
Cross-sectional: data collected at one point in time
Prospective: subjects included ‘at baseline’, outcome assessed in future/over time
o randomized controlled trial (RCT)
o longitudinal/observational study
Retrospective: outcome has been assessed, looking back in time
With these types of studies, different kinds of data can be collected:
Binary data: gender, HPV status (infected/not infected), myocardial infarction (yes/no)
Categorical data: alcohol consumption (none/moderate/heavy), clinical T-stage (1/2/3/4), water
source (river/pond/spring)
Continuous data: cholesterol, triglyceride concentration, quality of life
Time-to-event data (survival): time to dead, time to recurrence after treatment, time to get employed
o Difficult, because there are always cases that drop out of the study
Descriptive statistics summarize and describe important features of the data and concern the sample. This can
be shown with graphics, such as histogram, boxplot, scatter plot, etc., or with numerical summary measures,
such as the mean, median, standard deviation (SD), percentage, etc. Inferential statistics are used to draw a
conclusion beyond the data sample, using effect size, confidence interval, hypothesis testing, etc. The
distribution of the data can have different shapes.
Symmetrical and bell-shaped Positively skewed (to the right) Negatively skewed (to the left)
1
There are several measured of centre. These are the mean (= Σ xi ), median (middle value), or mode (most
n
frequent value). If the distribution is right-skewed, the mean is larger than the median.
Page | 1
,There are also measured of spread. These are the standard deviation =
√ 1
n
Σ¿ ¿ ¿ , the variance (SD2), the
range (max-min), or interquartile range (IQR = Q3 – Q1). The common practice in medical articles for
symmetric distributions is to report the mean and SD. For skewed distributions, the median and IQR are
reported, and for proportions, the n and % are reported. A scatter plot can be made to look at the distribution
of the data. A Pearson correlation of r = +1 shows a perfectly positive linear association and r = -1 shows a
perfectly negative linear association.
cor = 0.98 cor = -0.02 cor = -0.96
CONFIDENCE INTERVALS
Inferential statistics allow to draw conclusion of a population based on a sample. For estimation, the effect size
is used, for uncertainty, the confidence interval is used, and for
hypothesis testing, the p-value is used. The central limit
theorem states that under certain conditions, the distribution
of the average of a large number of independent, identically
distributed random variables tends to be approximately
normally distributed, regardless of the original distribution of
the variables. This result is particularly important because it
allows statisticians to make inferences about population
parameters using sample data.
What is the mean FEV1 in a population of children aged 7 – 10 year?
Sample: N = 636 children, mean is 1.59 L, SD = 0.30. Uncertainty quantified by standard error (SE): SE mean =
SD 0.30
= = 0.012. 95% Confidence Interval (CI): (mean - 1.96 x SE mean; mean + 1.96 x SEmean) = mean ± 1.96
√ n √ 636
x SEmean. 95% CI: 1.59 ± 1.96 x 0.012 = [1.57; 1.61].
The standard error tells how certain you are of an estimated mean. It is used to calculate confidence intervals.
In 95% of cases, the actual mean lies within the CI. These CI formulas depend on the assumption that the
distribution of the mean/proportion is approximately normal. This is often reasonable, especially as n grows.
However, this is not the case for many other statistics.
DIAGNOSTIC TESTING
The sensitivity shows how many relevant items are selected, e.g. how many sick people are correctly identified
TP
as having the condition. It is calculated with . The specificity shows how many negative selected
TP+ FN
elements are truly negative, e.g. how many healthy people are identified as not having the condition. . It is
TN
calculated with . The sensitivity and specificity do not directly inform on the predictive value of
TN + FP
Page | 2
, positive or negative tests. This is done with the positive and negative predictive value. The PPV is the
TP
probability that a person who has a positive test result truly has the disease. It is calculated with .A
TP+ FP
high PPV means that if the test result is positive, there’s a high chance the person actually has the condition.
The NPV is the probability that a person who has a negative test result truly does not have the disease. It is
TN
calculated with . A high NPV means that if the test result is negative, there’s a high chance the
TN+ FN
person does not have the condition. The prevalence has a large impact on the PPV and NPV. The prevalence is
the % cases in the population.
Example: sensitivity = 90%, specificity = 95%, and N = 1000. Case 1 prevalence = 10%.
PPV = positive cases / number of positives = 90/135 =
2/3 = 66.7%
NPV = negative controls / number of negatives =
855/865 = 98.8%
Case 2 prevalence = 30%
PPV = positive cases / number of positives = 270/305 =
88.5% >> 66.7%
NPV = negative controls / number of negatives =
665/695 = 95.7% (< 98.8%)
The confidence interval for proportion p = 95% CI: p ± 1.96 x SEp, with SEp =
√ p(1− p) . The confidence
√n
interval for PPV is SEp =
√ p ⋅( 1− p)
=
√ 0.667 ⋅ 0.333 = 0.027. 95% CI: 0.667 ± 1.96 x 0.027 = [0.720;
√n √305
0.614]. It is similar for the sensitivity, specificity and NPV.
HYPOTHESIS TESTING
The null hypothesis states that there is no difference between the means of the groups that are compared. The
alternative hypothesis states that there is a difference observed when comparing the means of 2 groups:
H0 mean FEV1 girls = mean FEV1 boys
Ha mean FEV1 girls ≠ mean FEV1 boys
CONFIDENCE INTERVALS AND P-VALUE
The effect size is the mean difference
between both groups, e.g. 1.66 – 1.54 = 0.12
(boys vs. girls). However, this does not
immediately mean that the null hypothesis
can be rejected. There are 2 approaches to
determine which of the hypotheses is true, which are using confidence intervals or p-values. To calculate the
confidence intervals, the standard error of the mean difference is calculated: SE diff =
√ ( n1−1 ) SD 21 + ( n2−1 ) SD22
n1+ n2−2
×
√ 1 1 . With this standard error, the confidence intervals can be
+
n1 n2
calculated with: meandiff ± 1.96 x SEdiff. If 0 does not fall in the confidence interval, the null hypothesis must be
rejected and the alternative hypothesis accepted.
Assume in the population that the mean difference is 0, which means that the null hypothesis is true. Then
there is a small chance that in the sample, a mean difference of 0.12 or more extreme is observed (> 0.12 or <
-0.12). This chance is the p-value. This chance is larger when the standard error is larger (more fluctuations), so
the p-value also depends on the SE. If the p-value is lower than 0.05, the null hypothesis must be rejected. The
probability of falsely rejecting a true H0 (type I error) is 0.05. An independent samples t-test or a paired
Page | 3
Statistics is the science of collecting, analysing, presenting and interpreting data. Many disciplines make use of
statistics, such as in doing medical research:
1. Ask question
o Extended antibiotic treatment (9x24h) is better than short antibiotic treatment (3x24h) for
the treatment of haematological patient with of iatrogenic neutropenia and fever of
unknown origin.
2. Formulate hypothesis
o Percentage of patients with fever-recurrence within 28 days does not differ between short
and extended antibiotic treatment.
3. Collect data
o Randomize 200 patients to receive either short or extended antibiotic treatment and count
the number of patients in each group with fever-recurrence within 28 days.
4. Analyse data
o 12 patients receiving short and 9 patients receiving extended antibiotic treatment had fever-
recurrence within 28 days; the 95% confidence interval for the difference of -3% is equal to (-
11.5%; 5.5%).
5. Formulate answer
o No statistical evidence for benefit of extended antibiotic treatment.
DESCRIPTIVE STATISTICS
There are several common types of study:
Cross-sectional: data collected at one point in time
Prospective: subjects included ‘at baseline’, outcome assessed in future/over time
o randomized controlled trial (RCT)
o longitudinal/observational study
Retrospective: outcome has been assessed, looking back in time
With these types of studies, different kinds of data can be collected:
Binary data: gender, HPV status (infected/not infected), myocardial infarction (yes/no)
Categorical data: alcohol consumption (none/moderate/heavy), clinical T-stage (1/2/3/4), water
source (river/pond/spring)
Continuous data: cholesterol, triglyceride concentration, quality of life
Time-to-event data (survival): time to dead, time to recurrence after treatment, time to get employed
o Difficult, because there are always cases that drop out of the study
Descriptive statistics summarize and describe important features of the data and concern the sample. This can
be shown with graphics, such as histogram, boxplot, scatter plot, etc., or with numerical summary measures,
such as the mean, median, standard deviation (SD), percentage, etc. Inferential statistics are used to draw a
conclusion beyond the data sample, using effect size, confidence interval, hypothesis testing, etc. The
distribution of the data can have different shapes.
Symmetrical and bell-shaped Positively skewed (to the right) Negatively skewed (to the left)
1
There are several measured of centre. These are the mean (= Σ xi ), median (middle value), or mode (most
n
frequent value). If the distribution is right-skewed, the mean is larger than the median.
Page | 1
,There are also measured of spread. These are the standard deviation =
√ 1
n
Σ¿ ¿ ¿ , the variance (SD2), the
range (max-min), or interquartile range (IQR = Q3 – Q1). The common practice in medical articles for
symmetric distributions is to report the mean and SD. For skewed distributions, the median and IQR are
reported, and for proportions, the n and % are reported. A scatter plot can be made to look at the distribution
of the data. A Pearson correlation of r = +1 shows a perfectly positive linear association and r = -1 shows a
perfectly negative linear association.
cor = 0.98 cor = -0.02 cor = -0.96
CONFIDENCE INTERVALS
Inferential statistics allow to draw conclusion of a population based on a sample. For estimation, the effect size
is used, for uncertainty, the confidence interval is used, and for
hypothesis testing, the p-value is used. The central limit
theorem states that under certain conditions, the distribution
of the average of a large number of independent, identically
distributed random variables tends to be approximately
normally distributed, regardless of the original distribution of
the variables. This result is particularly important because it
allows statisticians to make inferences about population
parameters using sample data.
What is the mean FEV1 in a population of children aged 7 – 10 year?
Sample: N = 636 children, mean is 1.59 L, SD = 0.30. Uncertainty quantified by standard error (SE): SE mean =
SD 0.30
= = 0.012. 95% Confidence Interval (CI): (mean - 1.96 x SE mean; mean + 1.96 x SEmean) = mean ± 1.96
√ n √ 636
x SEmean. 95% CI: 1.59 ± 1.96 x 0.012 = [1.57; 1.61].
The standard error tells how certain you are of an estimated mean. It is used to calculate confidence intervals.
In 95% of cases, the actual mean lies within the CI. These CI formulas depend on the assumption that the
distribution of the mean/proportion is approximately normal. This is often reasonable, especially as n grows.
However, this is not the case for many other statistics.
DIAGNOSTIC TESTING
The sensitivity shows how many relevant items are selected, e.g. how many sick people are correctly identified
TP
as having the condition. It is calculated with . The specificity shows how many negative selected
TP+ FN
elements are truly negative, e.g. how many healthy people are identified as not having the condition. . It is
TN
calculated with . The sensitivity and specificity do not directly inform on the predictive value of
TN + FP
Page | 2
, positive or negative tests. This is done with the positive and negative predictive value. The PPV is the
TP
probability that a person who has a positive test result truly has the disease. It is calculated with .A
TP+ FP
high PPV means that if the test result is positive, there’s a high chance the person actually has the condition.
The NPV is the probability that a person who has a negative test result truly does not have the disease. It is
TN
calculated with . A high NPV means that if the test result is negative, there’s a high chance the
TN+ FN
person does not have the condition. The prevalence has a large impact on the PPV and NPV. The prevalence is
the % cases in the population.
Example: sensitivity = 90%, specificity = 95%, and N = 1000. Case 1 prevalence = 10%.
PPV = positive cases / number of positives = 90/135 =
2/3 = 66.7%
NPV = negative controls / number of negatives =
855/865 = 98.8%
Case 2 prevalence = 30%
PPV = positive cases / number of positives = 270/305 =
88.5% >> 66.7%
NPV = negative controls / number of negatives =
665/695 = 95.7% (< 98.8%)
The confidence interval for proportion p = 95% CI: p ± 1.96 x SEp, with SEp =
√ p(1− p) . The confidence
√n
interval for PPV is SEp =
√ p ⋅( 1− p)
=
√ 0.667 ⋅ 0.333 = 0.027. 95% CI: 0.667 ± 1.96 x 0.027 = [0.720;
√n √305
0.614]. It is similar for the sensitivity, specificity and NPV.
HYPOTHESIS TESTING
The null hypothesis states that there is no difference between the means of the groups that are compared. The
alternative hypothesis states that there is a difference observed when comparing the means of 2 groups:
H0 mean FEV1 girls = mean FEV1 boys
Ha mean FEV1 girls ≠ mean FEV1 boys
CONFIDENCE INTERVALS AND P-VALUE
The effect size is the mean difference
between both groups, e.g. 1.66 – 1.54 = 0.12
(boys vs. girls). However, this does not
immediately mean that the null hypothesis
can be rejected. There are 2 approaches to
determine which of the hypotheses is true, which are using confidence intervals or p-values. To calculate the
confidence intervals, the standard error of the mean difference is calculated: SE diff =
√ ( n1−1 ) SD 21 + ( n2−1 ) SD22
n1+ n2−2
×
√ 1 1 . With this standard error, the confidence intervals can be
+
n1 n2
calculated with: meandiff ± 1.96 x SEdiff. If 0 does not fall in the confidence interval, the null hypothesis must be
rejected and the alternative hypothesis accepted.
Assume in the population that the mean difference is 0, which means that the null hypothesis is true. Then
there is a small chance that in the sample, a mean difference of 0.12 or more extreme is observed (> 0.12 or <
-0.12). This chance is the p-value. This chance is larger when the standard error is larger (more fluctuations), so
the p-value also depends on the SE. If the p-value is lower than 0.05, the null hypothesis must be rejected. The
probability of falsely rejecting a true H0 (type I error) is 0.05. An independent samples t-test or a paired
Page | 3