Parametric effect size related functions for tests along the lines of balanced ANOVAs (e.g., certain t-tests, correlations, and regressions).
Usage
parapower(effect = NULL, type = "f", N = NULL, k = 2, alpha = 0.05,
power = 0.8, range)
convert_effect(effect, type = "r", N = 3, k = 2, adjust_f = TRUE,
display_f = FALSE, print = TRUE)
critical_effect(N, k = 2, alpha = 0.05)
Arguments
- effect
Effect size of the specified type.
- type
Type of effect entered. Available types are Pearson's r (
r
), Cohen's d (d
), Area Under the Curve (AUC;a
), Odds Ratio (o
), η2 (eta-squared; R2;e
), or Cohen's f (f
).- N
Total sample size.
- k
Number of groups or cells, with 2 being the default and most generally appropriate.
- alpha
Alpha level (p-value; theoretical type I error / false positive).
- power
Power (
1 – β
; inverse theoretical type II error / false negative).- range
Interval of the estimated component for the optimizer to try. Defaults to
c(2, 10000)
forN
,c(0, 100)
foreffect
, andc(0, 1)
forpower
oralpha
. If you get an error fromuniroot
(like"f() values at end points not of opposite sign"
), try increasing the range (e.g., smaller low end for large effects, and larger high end for small effects).- adjust_f
Logical indicating whether the entered effect of type
f
should be converted from F to Cohen's f based onN
andk
:sqrt((k - 1) * F / (N - k))
.- display_f
Logical; if TRUE, calculates F and an associated p-value based on
N
andk
.Logical; if
FALSE
, results are only invisibly returned.
Details
This type of power analysis becomes increasingly difficult with more complex models. In those cases, simulation is very handy (though more difficult to abstract in terms of specification) –- if you can reasonably simulate the data you have in mind, you can reasonably estimate power for any model. See the Monte Carlo section of the examples.
Examples
# a priori power analysis (looking for n given an effect size, alpha, and power)
# with no arguments, this defaults to effect = c(.3, .5, .7), type = 'd'
parapower()
#> n k N alpha power Pearson's r Cohen's d AUC Odds Ratio eta^2
#> 176 2 352 0.05 0.8 0.1483 0.3 0.5840 1.723 0.02200
#> 64 2 128 0.05 0.8 0.2425 0.5 0.6382 2.477 0.05882
#> 34 2 68 0.05 0.8 0.3304 0.7 0.6897 3.560 0.10913
#> Cohen's f F num_df den_df p-value
#> 0.15 3.901 1 173.38 0.04984
#> 0.25 3.860 1 61.77 0.05394
#> 0.35 3.801 1 31.02 0.06033
# sensitivity analysis (looking for an effect size given n, alpha, and power)
parapower(N = 50)
#> n k N alpha power Pearson's r Cohen's d AUC Odds Ratio eta^2 Cohen's f
#> 50 2 100 0.05 0.8 0.2723 0.5659 0.6555 2.791 0.07413 0.283
#> F num_df den_df p-value
#> 3.843 1 48 0.05577
# post-hoc power analysis (looking for power given an effect size, n, and alpha)
# since 'f' is the default type, this is power given Cohen's f = .3
parapower(.3, N = 50)
#> n k N alpha power Pearson's r Cohen's d AUC Odds Ratio eta^2
#> 50 2 100 0.05 0.8439 0.2873 0.6 0.6643 2.969 0.08257
#> Cohen's f F num_df den_df p-value
#> 0.3 4.32 1 48 0.04304
# criterion analysis (looking for alpha given an effect size, n, and power)
parapower(.3, N = 50, power = .8)
#> n k N alpha power Pearson's r Cohen's d AUC Odds Ratio eta^2
#> 50 2 100 0.03369 0.8 0.2873 0.6 0.6643 2.969 0.08257
#> Cohen's f F num_df den_df p-value
#> 0.3 4.32 1 48 0.04304
#
# Monte Carlo power analysis
#
# simulate a simple effect: continuous normal x and related y
N <- 20
x <- rnorm(N)
y <- x * .4 + rnorm(N)
summary(lm(y ~ x))
#>
#> Call:
#> lm(formula = y ~ x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.23994 -0.57602 0.08468 0.39595 1.75405
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.6189 0.1931 3.205 0.00491 **
#> x 0.7918 0.2123 3.729 0.00154 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.8536 on 18 degrees of freedom
#> Multiple R-squared: 0.4358, Adjusted R-squared: 0.4045
#> F-statistic: 13.9 on 1 and 18 DF, p-value: 0.001536
#>
## here, we can input the correlation to get other effect sizes
## k defaults to 2, which is appropriate in this case
convert_effect(cor(x, y), N = N)
#> Pearson's r Cohen's d AUC Odds Ratio eta^2 Cohen's f F num_df den_df
#> 0.6602 1.758 0.8931 24.25 0.4358 0.8789 13.9 1 18
#> p-value
#> 0.001536
## we want to know what N gives us .8 power to detect this effect at .05 alpha
## we can set up a function to easily try different Ns
sim1 <- function(N, beta = .4, batches = 100, alpha = .05, power = .8) {
# first, set up a vector of 0s, which we'll fill with results from each run
op <- numeric(batches)
# now we repeat the simulation batches times
# (with i tracking the index of op)
for (i in seq_len(batches)) {
# reroll the simulated variables
x <- rnorm(N)
y <- x * beta + rnorm(N)
# set the current op entry to the p-value of the test
op[i] <- summary(lm(y ~ x))$coef[2, 4]
}
# observed power is percent of p-values under alpha
# output of the function is observed power - set power for optimization
# (where the target is 0; closest observed to set power)
mean(op < alpha) - power
}
## uniroot will try different values of N (between 20 and 100 here),
## and output the one that gets closest to the set power
uniroot(sim1, c(20, 100))$root
#> [1] 56.62202
if (FALSE) { # \dontrun{
## setting power to 0 will give the observed power
## increase batches for more stable estimates
sim1(55, batches = 5000, power = 0)
## compare with a binary x
sim1b <- function(N, beta = .4, batches = 100, alpha = .05, power = .8) {
op <- numeric(batches)
for (i in seq_len(batches)) {
x <- rep_len(c(0, 1), N)
y <- x * beta + rnorm(N)
op[i] <- summary(lm(y ~ x))$coef[2, 4]
}
mean(op < alpha) - power
}
uniroot(sim1b, c(100, 300))$root
sim1b(200, batches = 5000, power = 0)
# this is what the most basic parametric power analysis is getting at
parapower(.4, "d")
# simulate a more complicated effect: two binary variables with an interaction effect
c1 <- sample(c(0, 1), N, TRUE)
c2 <- sample(c(0, 1), N, TRUE)
y <- c2 * c1 * .8 + rnorm(N)
(m <- summary(lm(y ~ c1 * c2)))
## R-squared is eta^2, and k is 4 (2x2)
convert_effect(m$r.squared, "eta^2", N, 4)
## same kind of function as before, only difference is the simulated data and test
## also have to recalculate the omnibus p-value since it's not returned
sim2 <- function(N, beta = .8, batches = 100, alpha = .05, power = .8) {
op <- numeric(batches)
for (i in seq_len(batches)) {
c1 <- sample(c(0, 1), N, TRUE)
c2 <- sample(c(0, 1), N, TRUE)
y <- c2 * c1 * beta + rnorm(N)
f <- summary(lm(y ~ c1 * c2))$fstatistic
op[i] <- pf(f[1], f[2], f[3], lower.tail = FALSE)
}
mean(op < alpha) - power
}
uniroot(sim2, c(50, 300))$root
sim2(100, batches = 5000, power = 0)
## note here that we're looking at the omnibus test
## if you're actually interested in, say, the interaction effect, that may affect power
sim2i <- function(N, beta = .8, batches = 100, alpha = .05, power = .8) {
op <- numeric(batches)
for (i in seq_len(batches)) {
c1 <- sample(c(0, 1), N, TRUE)
c2 <- sample(c(0, 1), N, TRUE)
y <- c2 * c1 * beta + rnorm(N)
op[i] <- summary(lm(y ~ c1 * c2))$coef[4, 4]
}
mean(op < alpha) - power
}
uniroot(sim2i, c(100, 300))$root
sim2i(200, batches = 5000, power = 0)
} # }