x <- 71:100
y <- dbinom(x,100,0.9)
plot(x, y, lwd=2, type="h")
lines(x[90-70], y[90-70], lwd=2, type="h", col="red")Introduction to Medical Statistics 2026
Exercise class III
Statistical Analysis: Main Concepts and Principles; Binomial Distribution
Exercise class III
Statistical Analysis: Main Concepts and Principles; Binomial Distribution
I. Calculation of binomial probabilities
The US CDC estimates that 90% of Americans have had chickenpox by the time they reach adulthood.
- Suppose we take a random sample of 100 American adults. Is the use of the binomial distribution appropriate for calculating the probability that exactly 97 out of 100 randomly sampled American adults had chickenpox during childhood? Explain your answer.
- How many American adults that had chickenpox would you expect to observe among the 100? Wat is the variance of the number of observed cases among 100 sampled individuals? Use the
dbinomfunction to calculate the probability that exactly 97 out of 100 randomly sampled American adults had chickenpox during childhood, i.e. compute P(X=97) if X \(\sim\) B(0.9,100).
- What is the probability that exactly 3 out of a new sample of 100 American adults have not had chickenpox in their childhood?
- Plot the probability function of this binomial distribution. See whether the answer from b. corresponds with the value in the plot. Which number has the highest probability and how large is that probability?
- What’s the probability that we observe a number of adults that is at least 5 lower or higher than 90? Hint: use the
pbinomfunction to compute \(P(X \leq 85)\) and \(P(X \geq 95)\). How do you read this number from the plot that you made in b?
pbinom(85, size=100, prob=0.9) + (1-pbinom(94, size=100, prob=0.9) )[1] 0.1301499
- We actually calculate the p-value of a test of the null hypothesis that p is 0.9 against the two-sided alternative hypothesis that there is a difference if the observed number of infected individuals is 95. Why?
II: Inference for a single proportion
A new chemotherapy has entered phase 2 in drug development. A tumour response is defined as a decrease in tumour size by at least 50% within 6 months after therapy initiation. According to experts, the tumour response probability of the drug should be more than 20% in order to proceed to a phase 3 study.
The statistical test of interest is the test for the tumour response probability \(\pi\) with the null hypothesis \(H_0: \pi=0.2\) (or \(H_0: \pi \leq 0.2\)) versus the alternative \(H_A: \pi >0.2\). The phase 2 trial consists of 50 patients. We observe the tumour response for each patient and then perform the statistical test.
- Assume that 16/50 patients show a tumour response. Calculate the corresponding one-sided p-value and describe the result and your conclusion in words. Hint: You can either calculate the p-value yourself using the binomial distribution (as in Exercise I.b or I.e) or you can use the
prop.testorbinom.testfunction. (Note that these two functions use different methods; onlybinom.testgives the same answer as the direct calculation.)
# version 1: using the binomial distribution
1-pbinom(15, size=50, prob=0.2)[1] 0.03080342
# equivalently:
sum(dbinom(16:50, size=50, prob=0.2)) [1] 0.03080342
# version 2: using binom.test
binom.test(16, 50, p = 0.2, alternative = "greater")
Exact binomial test
data: 16 and 50
number of successes = 16, number of trials = 50, p-value = 0.0308
alternative hypothesis: true probability of success is greater than 0.2
95 percent confidence interval:
0.2121041 1.0000000
sample estimates:
probability of success
0.32
# version 3: using prop.test
prop.test(16, 50, p = 0.2, alternative = "greater")
1-sample proportions test with continuity correction
data: 16 out of 50, null probability 0.2
X-squared = 3.7812, df = 1, p-value = 0.02591
alternative hypothesis: true p is greater than 0.2
95 percent confidence interval:
0.2145141 1.0000000
sample estimates:
p
0.32
- The experts expected a tumour response of 40%. Suppose that the true tumour response probability is indeed 0.4. We reject the null hypothesis if we get a one-sided p-value smaller than 0.025. What is the probability that we reject the null hypothesis if we have a sample of 50 patients (not the current sample, but an arbitrary new sample)? This is the basis of the so-called power or sample size calculation. Hint: First show that for an observed number of tumour responses of 16 the test does not reject the null hypothesis, while for 17 it does. Then calculate the probability of \(P(X\geq 17)\) if the alternative is true, i.e. if p=0.4.
1-pbinom(15, size=50, prob=0.2)[1] 0.03080342
# p-value > 0.025
1-pbinom(16, size=50, prob=0.2)[1] 0.01444166
# p-value < 0.025
# hence we reject the null hypothesis if X>16
# calculate probability of X>=17 if the alternative is true (p=0.4)
1-pbinom(16,size=50,prob=0.4) [1] 0.8439094
III: Analysis of a diagnostic test
A diagnostic study evaluated the Platelia NS1 ELISA assay for diagnosis of dengue. The study participants consisted of 853 children admitted to Children’s Hospital #1, Children’s Hospital #2, or HTD from August 2006 to March 2007. Participants were eligible for entry to the study if they had a history of fever of less than seven days and there was a clinical suspicion of dengue. A patient was classified as having dengue by the reference test (gold standard) if there was RT-PCR detection of DENV RNA in plasma, and/or viral culture and/or serological changes in DENV reactive IgM or IgG levels in paired plasma specimens. The data is stored in dengueNS1.csv, the file dengueNS1_description.txt describes the variables in the data set.
Import the dataset and use the summary function for a data summary.
ns <- read.csv("Data/dengueNS1.csv", as.is=FALSE)
summary(ns) studyNo dayOfIllness dengue serotype ns1
Min. : 1.0 Min. :1.000 no :122 DENV-1 :345 negative:354
1st Qu.: 81.0 1st Qu.:3.000 yes:731 DENV-2 :230 positive:499
Median :156.0 Median :3.000 DENV-3 : 66
Mean :157.2 Mean :3.476 DENV-4 : 7
3rd Qu.:235.0 3rd Qu.:4.000 undetermined:205
Max. :325.0 Max. :6.000
- Use the
tablefunction to create a cross-table of NS1 result and “true” dengue status; you can add column and row totals via theaddmarginsfunction. Calculate the prevalence (proportion) of children with dengue.
addmargins(table(ns$ns1,ns$dengue))
no yes Sum
negative 122 232 354
positive 0 499 499
Sum 122 731 853
- Calculate the proportion of children with dengue that test positive in the NS1 assay. This quantity is called the sensitivity “SENS” of the NS1 test. Also calculate the proportion of children without dengue that test negative (the specificity “SPEC” of the test). What do you think of the performance of the test?
- Calculate the proportion of children that have dengue amongst the ones that test positive in the NS1 assay. This is called the positive predictive value (PPV). Also calculate the proportion of children that do not have dengue amongst the ones that test negative in the NS1 assay. Which characteristic of the test is more clinically relevant, SENS and SPEC or PPV and NPV?
- Is there evidence for a difference in sensitivity of the NS1 ELISA assay between DENV serotypes? Use the
tbl_summaryfunction from the gtsummary package (No statistical test is required, just an exploratory data analysis.)
library(gtsummary)
ns.truepos <- subset(ns,dengue=="yes") # select only subjects with dengue
tbl_summary(data=ns,by=serotype,include=c(serotype,ns1))| Characteristic | DENV-1 N = 3451 |
DENV-2 N = 2301 |
DENV-3 N = 661 |
DENV-4 N = 71 |
undetermined N = 2051 |
|---|---|---|---|---|---|
| ns1 | |||||
| negative | 72 (21%) | 96 (42%) | 13 (20%) | 3 (43%) | 170 (83%) |
| positive | 273 (79%) | 134 (58%) | 53 (80%) | 4 (57%) | 35 (17%) |
| 1 n (%) | |||||
## a plain table can be obtained via
table(ns$ns1,ns$serotype)
DENV-1 DENV-2 DENV-3 DENV-4 undetermined
negative 72 96 13 3 170
positive 273 134 53 4 35
IV: difference in proportions
The dataset bmData.csv contains selected variables from 300 patients with confirmed bacterial meningitis. They were randomized to either adjunctive dexamethasone therapy or placebo. Import the dataset and use the summary function for a data summary.
bmData <- read.csv("Data/bmData.csv", as.is=FALSE)
summary(bmData) patId group age sex gcs
BM 10 : 1 dexamethasone:143 Min. :15.00 F: 80 Min. : 3.00
BM 100 : 1 placebo :157 1st Qu.:31.00 M:220 1st Qu.: 9.00
BM 101 : 1 Median :43.00 Median :13.00
BM 108 : 1 Mean :43.22 Mean :11.76
BM 109 : 1 3rd Qu.:54.00 3rd Qu.:15.00
BM 110 : 1 Max. :91.00 Max. :15.00
(Other):294
paresis op.csf wc.csf glu.csf
Min. :0.00000 Min. : 0.0 Min. : 1 Min. : 0.10
1st Qu.:0.00000 1st Qu.:15.0 1st Qu.: 1085 1st Qu.: 10.00
Median :0.00000 Median :20.0 Median : 3160 Median : 20.00
Mean :0.08333 Mean :22.9 Mean : 6252 Mean : 25.03
3rd Qu.:0.00000 3rd Qu.:30.0 3rd Qu.: 7830 3rd Qu.: 36.00
Max. :1.00000 Max. :55.0 Max. :64000 Max. :100.00
NA's :63 NA's :1
wc.csf.fup death.6mo
Min. : 16.0 Min. :0.0000
1st Qu.: 352.5 1st Qu.:0.0000
Median : 902.5 Median :0.0000
Mean : 2883.4 Mean :0.1033
3rd Qu.: 2375.0 3rd Qu.:0.0000
Max. :84000.0 Max. :1.0000
NA's :22
- Create a 2x2 table that compares the number of deaths at 6 months between the two treatment arms, using the function
xtabsortable. Is there a difference in the survival status at 6 months of follow-up between the two randomized groups?
xtabs(~group+death.6mo, data=bmData) death.6mo
group 0 1
dexamethasone 134 9
placebo 135 22
- Perform a formal test for a difference in survival status. Formulate the null hypothesis and the alternative hypothesis. Use the
chisq.testfunction.
chisq.test(matrix(c(9,22,134,135), nrow=2))
Pearson's Chi-squared test with Yates' continuity correction
data: matrix(c(9, 22, 134, 135), nrow = 2)
X-squared = 4.0154, df = 1, p-value = 0.04509
chisq.test(matrix(c(9,22,134,135), nrow=2), correct=FALSE)
Pearson's Chi-squared test
data: matrix(c(9, 22, 134, 135), nrow = 2)
X-squared = 4.8125, df = 1, p-value = 0.02825
- Use the
prop.testfunction as alternative. Look at the difference in the way the results are shown compared to the chi-squared test.
prop.test(c(22,9), c(157,143))
2-sample test for equality of proportions with continuity correction
data: c(22, 9) out of c(157, 143)
X-squared = 4.0154, df = 1, p-value = 0.04509
alternative hypothesis: two.sided
95 percent confidence interval:
0.003185485 0.151195167
sample estimates:
prop 1 prop 2
0.14012739 0.06293706
prop.test(c(22,9), c(157,143), correct=FALSE)
2-sample test for equality of proportions without continuity correction
data: c(22, 9) out of c(157, 143)
X-squared = 4.8125, df = 1, p-value = 0.02825
alternative hypothesis: two.sided
95 percent confidence interval:
0.009866702 0.144513950
sample estimates:
prop 1 prop 2
0.14012739 0.06293706