load the required packages
library('paramtest')
library('pwr')
library('tidyverse')
the function creates two normally distributed samples of data, namely x1 and x2
x1 has mean = 0 and SD = 1
x2 has mean = d and SD = 1
the function requires you to specify the standardised effect size, d
and the sample size for each group, N
t_func <- function(simNum, N, d) {
x1 <- rnorm(N, 0, 1)
x2 <- rnorm(N, d, 1)
t <- t.test(x1, x2, var.equal = TRUE) # run t-test
stat <- t$statistic
p <- t$p.value
return(c(t = stat, p = p, sig = (p < .05)))
# return a named vector with the results we want to keep
# value of sig will be TRUE or FALSE based on the p value
}
We will begin with using values of d and N from the G*Power example
Sample size N = 64
cohen’s d = 0.5
number of iterations = 1000
Use paramtest::run_test() to run our required number of iterations of the function results go in the object “power_ttest”
power_ttest <-
run_test(
t_func,
n.iter = 1000,
output = 'data.frame',
N = 64,
d = .5
)
## Running 1,000 tests...
results(power_ttest) %>%
summarise(power = mean(sig))
## power
## 1 0.793
Compare the reported value with the value from G*Power (power = 0.8015)
Note that it will be slightly different, but accuracy can be improved by increasing the number of iterations.
give the params argument a list of parameters we want to vary.
lets try N=50, N=100 and N=150 (per group)
power_ttest_vary <-
grid_search(
t_func,
params = list(N = c(50, 100, 150)),
n.iter = 1000,
output = 'data.frame',
d = .5
)
## Running 3,000 tests...
results(power_ttest_vary) %>%
group_by(N.test) %>%
summarise(power = mean(sig))
## # A tibble: 3 x 2
## N.test power
## <dbl> <dbl>
## 1 50 0.68
## 2 100 0.925
## 3 150 0.986
vary N and Cohen’s d
We will use values to cover the range shown in the G*Power example.
power_ttest_vary2 <-
grid_search(
t_func,
params = list(N = c(25, 50, 100, 200), d = c(.33, .5, .67)),
n.iter = 1000,
output = 'data.frame'
)
## Running 12,000 tests...
power <- results(power_ttest_vary2) %>%
group_by(N.test, d.test) %>%
summarise(power = mean(sig))
## `summarise()` has grouped output by 'N.test'. You can override using the `.groups` argument.
print(power)
## # A tibble: 12 x 3
## # Groups: N.test [4]
## N.test d.test power
## <dbl> <dbl> <dbl>
## 1 25 0.33 0.223
## 2 25 0.5 0.413
## 3 25 0.67 0.654
## 4 50 0.33 0.361
## 5 50 0.5 0.696
## 6 50 0.67 0.902
## 7 100 0.33 0.636
## 8 100 0.5 0.943
## 9 100 0.67 0.998
## 10 200 0.33 0.897
## 11 200 0.5 0.997
## 12 200 0.67 1
ggplot(power, aes(
x = N.test,
y = power,
group = factor(d.test),
colour = factor(d.test)
)) +
geom_point() +
geom_line() +
ylim(c(0, 1)) +
xlim(c(0, 200)) +
labs(x = 'Sample Size', y = 'Power', colour = "Cohen's d") +
theme_minimal()
Compare the plot with the G*Power plot. They are roughly similar.
Note: I have set n.iter = 1000 for each simulation to keep the run time very short. Consider increasing n.iter to improve accuracy of results.
This simulation method can be generalised to a wide variety of statistical models. Other packages may be preferred for specific situations.
Reference: https://cran.r-project.org/web/packages/paramtest/vignettes/Simulating-Power.html