Parametric Power Analysis — parapower • lusilab

Parametric effect size related functions for tests along the lines of balanced ANOVAs (e.g., certain t-tests, correlations, and regressions).

Usage

parapower(effect = NULL, type = "f", N = NULL, k = 2, alpha = 0.05,
  power = 0.8, range)

convert_effect(effect, type = "r", N = 3, k = 2, adjust_f = TRUE,
  display_f = FALSE, print = TRUE)

critical_effect(N, k = 2, alpha = 0.05)

Arguments

effect: Effect size of the specified type.
type: Type of effect entered. Available types are Pearson's r (r), Cohen's d (d), Area Under the Curve (AUC; a), Odds Ratio (o), η2 (eta-squared; R2; e), or Cohen's f (f).
N: Total sample size.
k: Number of groups or cells, with 2 being the default and most generally appropriate.
alpha: Alpha level (p-value; theoretical type I error / false positive).
power: Power (1 – β; inverse theoretical type II error / false negative).
range: Interval of the estimated component for the optimizer to try. Defaults to c(2, 10000) for N, c(0, 100) for effect, and c(0, 1) for power or alpha. If you get an error from uniroot (like "f() values at end points not of opposite sign"), try increasing the range (e.g., smaller low end for large effects, and larger high end for small effects).
adjust_f: Logical indicating whether the entered effect of type f should be converted from F to Cohen's f based on N and k: sqrt((k - 1) * F / (N - k)).
display_f: Logical; if TRUE, calculates F and an associated p-value based on N and k.
print: Logical; if FALSE, results are only invisibly returned.

Details

This type of power analysis becomes increasingly difficult with more complex models. In those cases, simulation is very handy (though more difficult to abstract in terms of specification) –- if you can reasonably simulate the data you have in mind, you can reasonably estimate power for any model. See the Monte Carlo section of the examples.

Examples

# a priori power analysis (looking for n given an effect size, alpha, and power)
# with no arguments, this defaults to effect = c(.3, .5, .7), type = 'd'
parapower()
#>    n k   N alpha power Pearson's r Cohen's d    AUC Odds Ratio   eta^2
#>  176 2 352  0.05   0.8      0.1483       0.3 0.5840      1.723 0.02200
#>   64 2 128  0.05   0.8      0.2425       0.5 0.6382      2.477 0.05882
#>   34 2  68  0.05   0.8      0.3304       0.7 0.6897      3.560 0.10913
#>  Cohen's f     F num_df den_df p-value
#>       0.15 3.901      1 173.38 0.04984
#>       0.25 3.860      1  61.77 0.05394
#>       0.35 3.801      1  31.02 0.06033

# sensitivity analysis (looking for an effect size given n, alpha, and power)
parapower(N = 50)
#>   n k   N alpha power Pearson's r Cohen's d    AUC Odds Ratio   eta^2 Cohen's f
#>  50 2 100  0.05   0.8      0.2723    0.5659 0.6555      2.791 0.07413     0.283
#>      F num_df den_df p-value
#>  3.843      1     48 0.05577

# post-hoc power analysis (looking for power given an effect size, n, and alpha)
# since 'f' is the default type, this is power given Cohen's f = .3
parapower(.3, N = 50)
#>   n k   N alpha  power Pearson's r Cohen's d    AUC Odds Ratio   eta^2
#>  50 2 100  0.05 0.8439      0.2873       0.6 0.6643      2.969 0.08257
#>  Cohen's f    F num_df den_df p-value
#>        0.3 4.32      1     48 0.04304

# criterion analysis (looking for alpha given an effect size, n, and power)
parapower(.3, N = 50, power = .8)
#>   n k   N   alpha power Pearson's r Cohen's d    AUC Odds Ratio   eta^2
#>  50 2 100 0.03369   0.8      0.2873       0.6 0.6643      2.969 0.08257
#>  Cohen's f    F num_df den_df p-value
#>        0.3 4.32      1     48 0.04304

#
# Monte Carlo power analysis
#

# simulate a simple effect: continuous normal x and related y
N <- 20
x <- rnorm(N)
y <- x * .4 + rnorm(N)
summary(lm(y ~ x))
#> 
#> Call:
#> lm(formula = y ~ x)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.23994 -0.57602  0.08468  0.39595  1.75405 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)   0.6189     0.1931   3.205  0.00491 **
#> x             0.7918     0.2123   3.729  0.00154 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.8536 on 18 degrees of freedom
#> Multiple R-squared:  0.4358,	Adjusted R-squared:  0.4045 
#> F-statistic:  13.9 on 1 and 18 DF,  p-value: 0.001536
#> 

## here, we can input the correlation to get other effect sizes
## k defaults to 2, which is appropriate in this case
convert_effect(cor(x, y), N = N)
#>  Pearson's r Cohen's d    AUC Odds Ratio  eta^2 Cohen's f    F num_df den_df
#>       0.6602     1.758 0.8931      24.25 0.4358    0.8789 13.9      1     18
#>   p-value
#>  0.001536

## we want to know what N gives us .8 power to detect this effect at .05 alpha
## we can set up a function to easily try different Ns
sim1 <- function(N, beta = .4, batches = 100, alpha = .05, power = .8) {
  # first, set up a vector of 0s, which we'll fill with results from each run
  op <- numeric(batches)
  # now we repeat the simulation batches times
  # (with i tracking the index of op)
  for (i in seq_len(batches)) {
    # reroll the simulated variables
    x <- rnorm(N)
    y <- x * beta + rnorm(N)
    # set the current op entry to the p-value of the test
    op[i] <- summary(lm(y ~ x))$coef[2, 4]
  }
  # observed power is percent of p-values under alpha
  # output of the function is observed power - set power for optimization
  # (where the target is 0; closest observed to set power)
  mean(op < alpha) - power
}

## uniroot will try different values of N (between 20 and 100 here),
## and output the one that gets closest to the set power
uniroot(sim1, c(20, 100))$root
#> [1] 56.62202

if (FALSE) { # \dontrun{

## setting power to 0 will give the observed power
## increase batches for more stable estimates
sim1(55, batches = 5000, power = 0)

## compare with a binary x
sim1b <- function(N, beta = .4, batches = 100, alpha = .05, power = .8) {
  op <- numeric(batches)
  for (i in seq_len(batches)) {
    x <- rep_len(c(0, 1), N)
    y <- x * beta + rnorm(N)
    op[i] <- summary(lm(y ~ x))$coef[2, 4]
  }
  mean(op < alpha) - power
}
uniroot(sim1b, c(100, 300))$root
sim1b(200, batches = 5000, power = 0)

# this is what the most basic parametric power analysis is getting at
parapower(.4, "d")

# simulate a more complicated effect: two binary variables with an interaction effect
c1 <- sample(c(0, 1), N, TRUE)
c2 <- sample(c(0, 1), N, TRUE)
y <- c2 * c1 * .8 + rnorm(N)
(m <- summary(lm(y ~ c1 * c2)))

## R-squared is eta^2, and k is 4 (2x2)
convert_effect(m$r.squared, "eta^2", N, 4)

## same kind of function as before, only difference is the simulated data and test
## also have to recalculate the omnibus p-value since it's not returned
sim2 <- function(N, beta = .8, batches = 100, alpha = .05, power = .8) {
  op <- numeric(batches)
  for (i in seq_len(batches)) {
    c1 <- sample(c(0, 1), N, TRUE)
    c2 <- sample(c(0, 1), N, TRUE)
    y <- c2 * c1 * beta + rnorm(N)
    f <- summary(lm(y ~ c1 * c2))$fstatistic
    op[i] <- pf(f[1], f[2], f[3], lower.tail = FALSE)
  }
  mean(op < alpha) - power
}
uniroot(sim2, c(50, 300))$root
sim2(100, batches = 5000, power = 0)

## note here that we're looking at the omnibus test
## if you're actually interested in, say, the interaction effect, that may affect power
sim2i <- function(N, beta = .8, batches = 100, alpha = .05, power = .8) {
  op <- numeric(batches)
  for (i in seq_len(batches)) {
    c1 <- sample(c(0, 1), N, TRUE)
    c2 <- sample(c(0, 1), N, TRUE)
    y <- c2 * c1 * beta + rnorm(N)
    op[i] <- summary(lm(y ~ c1 * c2))$coef[4, 4]
  }
  mean(op < alpha) - power
}
uniroot(sim2i, c(100, 300))$root
sim2i(200, batches = 5000, power = 0)
} # }