Power Tools for Epidemiologists

Acknowledgements

Gerald van Belle “Statistical Rules of Thumb”
Andrew Gelman and Jennifer Hill’s “Data Analysis Using Regression and Multilevel/Hierarchical Models” (Chapter 20)
Brad Carlin
JoAnn Alvarez

COOKING A SAMPLE SIZE

Difference between two groups on continuous measurement

Difference between two groups on continuous measurement

Two Key Ingredients

Difference between two groups on continuous measurement

Difference between two groups on continuous measurement

The Recipe

Difference between two groups on continuous measurement

Difference between two groups on continuous measurement

The Critical Value

Difference between two groups on continuous measurement

Difference between two groups on continuous measurement

“Classic” Formulation

Difference between two groups on continuous measurement

Difference between two groups on continuous measurement

More Bang for your Buck

Difference between two groups on continuous measurement

Difference between two groups on continuous measurement

CONTINUOUS MEASUREMENTS

Lehr’s Equation (Difference Between Two Means)

1
2
3
4
5
6
nMeans<-function(x0, x1, sd){
    d<-(x1-x0)/sd           
    n<-16/d^2
    n
    }
nMeans(100,90,20)   

Evaluate range of values

1
2
3
unexp<-100
exp<-seq(88,98,2)
nMeans(unexp, exp, 20)
1
2
dev<-seq(10,20,2)
nMeans(100, 90, dev)
1
2
3
4
for(i in seq(10,20,2)){
    size<-nMeans(unexp,exp, i)
    print(size)
}

Detectible Difference Between Two Means

1
2
3
4
nDD<-function(sd, n){  
    dd<-4*sd/(sqrt(n))
    dd
    }
1
nDD(20,50)

Percentage Change in Means

1
2
3
4
nPC<-function(cv, pc){
    x<-16*(cv)^2/((log((1-pc)))^2)
    print(x)
}
1
2
3
4
5
6
a<-c(.05,.10,.15,.20,.30,.40,.50,.75,1)
nPC(a,.15)

plot(a,nPC(a,.15),  ylab="Number in Each Group", 
xlab="By Varying Coefficent of Variation", 
main="Sample Size Estimate for a 15% Difference")

Or look at how desired effect size changes sample calculations

1
2
3
plot(b, nPC(.3,b), ylab="Number in Each Group", 
xlab="By Varying Difference Between Two Groups", 
main="Sample Size Estimate for a Coefficient of Variation of 0.3")

POISSON DISTRIBUTED OR COUNT DATA

The square root of poisson-distributed data is approximately normally distributed,

R functions for Poisson-distributed data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 # count data, x= population mean
nPois<-function(x0,x1){        
    n<-4/(sqrt(x0)-sqrt(x1))^2    
    n    
    }
nPois(1,2)

 #takes background rate x into account
nbrPois<-function(x,x0,x1){        
    n<-4/(sqrt(x+x0)-sqrt(x+x1))^2    
    n    
    }
nbrPois(1.5,1,2)


 # n beyond background rate needed
nbbrPois<-function(x){             
    n<-4*sqrt(x)
    n
    }
nbbrPois(50000)

Upper Bound for Count Data when No Events Have Occurred (Rule Of 3)

Relative Risks and Odds Ratios

number of outcomes needed to detect a relative risk

Log-Based Formulas

BINOMIAL DATA OR PROPORTIONS

The Rule of 50

MORE RULES OF THUMB FROM GERALD VAN BELLE

BAYESIAN APPROACH

A (Bayesian) Twist to Lehr’s Equation

1
2
3
4
5
nMeans.vary<-function(diff, sd){   
    d<-diff/sd              
    n<-16/d^2
    n
    }
1
2
3
4
5
6
7
8
9
install.packages("truncnorm")
library(truncnorm)

sd.r<-rtruncnorm(n=1000, a=0, mean=20, sd=1)
diff.r<-rtruncnorm(n=1000, a=0, mean=10, sd=1)

n1<-nMeans.vary(diff.r,sd.r)
summary(n1)
plot(density(n1))   
1
2
3
4
5

p1<-sqrt(64/2) * diff.r/sd.r -1.96

summary(pnorm(p1))
plot(density(pnorm(p1)))

“ADAPTIVE” BAYESIAN DESIGNS

traditional approach

1
2
3
4
5
6
nBin<-function(p0,p1){
    p<-(p0+p1)/2
    n<-16*p*(1-p)/(p0-p1)^2
    n   
    } 
nBin(.25,.1)

take limited resources into account

1
2
3
4
5
6
7
estBetaParams <- function(mu, var) {
  alpha <- ((1 - mu) / var - 1 / mu) * mu ^ 2
  beta <- alpha * (1 / mu - 1)
  return(params = list(alpha = alpha, beta = beta))
}

estBetaParams(.4,(.1)^2)
1
2
3
4
5
6
7
8
9
N=1000
theta<-rbeta(N,9.2,13.8)
x<-rbinom(N,20, theta)
y<-0
accept<-ifelse(x>14.5, y+1, y+0)
prob<-sum(accept)/N
prob
library(coda)
densityplot(accept)

take what’s known into account

1
2
3
4
5
6
7
N=1000
theta<-rbeta(N,24.2,18.8)
x<-rbinom(N,20, theta)
y<-0
accept<-ifelse(x>14.5, y+1, y+0)
prob<-sum(accept)/N
prob

from updating to adapting

1
2
3
4
5
6
7
N=1000
theta<-rbeta(N,38.2,24.8)
x<-rbinom(N,40, theta)
y<-0
accept<-ifelse(x>25, y+1, y+0)
prob<-sum(accept)/N
prob

Bayesian Sample Sizes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Nrep<-1000
s<-40:100
theta<-rbeta(Nrep,38.2,24.8)
y<-0

results<-matrix(0, nrow=length(s), ncol=1)
for(i in 1:length(s)){
x<-rbinom(Nrep,s[i], theta) 
accept<-ifelse(x>25, y+1, y+0)
prob[i]<-sum(accept)/Nrep
results[i,]<-prob[i]
}

tab<-cbind(s, results)

tab

weighting evidence

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Nrep<-1000
s<-40:100
theta<-rbeta(Nrep,26.2,20.4)
y<-0

results<-matrix(0, nrow=length(s), ncol=1)
for(i in 1:length(s)){
x<-rbinom(Nrep,s[i], theta) 
accept<-ifelse(x>25, y+1, y+0)
prob[i]<-sum(accept)/Nrep
results[i,]<-prob[i]
}

tab<-cbind(s, results)

tab
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Nrep<-1000
s<-40:100
theta<-rbeta(Nrep,31.2,31.8)
y<-0

results<-matrix(0, nrow=length(s), ncol=1)
for(i in 1:length(s)){
x<-rbinom(Nrep,s[i], theta) 
accept<-ifelse(x>25, y+1, y+0)
prob[i]<-sum(accept)/Nrep
results[i,]<-prob[i]
}

tab<-cbind(s, results)

tab