Table of Contents
Fetching ...

General measures of effect size to calculate power and sample size for Wald tests with generalized linear models

Amy L Cochran, Shijie Yuan, Paul J Rathouz

TL;DR

This work tackles the challenge of power and sample size calculations for Wald tests in GLMs with multiple predictors and adjustors, where full distributional specification is often impractical. It introduces two general effect-size measures, $oldsymbol{ extphi}_{x|z}$ and $R^2_{x|z}$, grounded in first- and second-moment information, to approximate the noncentrality parameter essential for power calculations. The authors derive approximation bounds, explore asymptotic local-alternative behavior, and validate the approach via simulations across common GLMs and a real-case study on education and mental health treatment. The framework extends linear-regression concepts (partial $R^2$, Cohen’s d) to GLMs, enabling more flexible and interpretable PSS planning across diverse models, though its accuracy hinges on predictor-adjustor variance and distributional features. This work has practical implications for study design, offering a tractable route to power analysis in complex GLM settings without full joint-distribution requirements.

Abstract

Power and sample size calculations for Wald tests in generalized linear models (GLMs) are often limited to specific cases like logistic regression. More general methods typically require detailed study parameters that are difficult to obtain during planning. We introduce two new effect size measures for estimating power and sample size in studies using Wald tests across any GLM. These measures accommodate any number of predictors or adjusters and require only basic study information. We provide practical guidance for interpreting and applying these measures to approximate a key parameter in power calculations. We also derive asymptotic bounds on the relative error of these approximations, showing that accuracy depends on features of the GLM such as the nonlinearity of the link function. To complement this analysis, we conduct simulation studies across common model specifications, identifying best use cases and opportunities for improvement. Finally, we test the methods in finite samples to confirm their practical utility, using a case study on the relationship between education and receipt of mental health treatment.

General measures of effect size to calculate power and sample size for Wald tests with generalized linear models

TL;DR

This work tackles the challenge of power and sample size calculations for Wald tests in GLMs with multiple predictors and adjustors, where full distributional specification is often impractical. It introduces two general effect-size measures, and , grounded in first- and second-moment information, to approximate the noncentrality parameter essential for power calculations. The authors derive approximation bounds, explore asymptotic local-alternative behavior, and validate the approach via simulations across common GLMs and a real-case study on education and mental health treatment. The framework extends linear-regression concepts (partial , Cohen’s d) to GLMs, enabling more flexible and interpretable PSS planning across diverse models, though its accuracy hinges on predictor-adjustor variance and distributional features. This work has practical implications for study design, offering a tractable route to power analysis in complex GLM settings without full joint-distribution requirements.

Abstract

Power and sample size calculations for Wald tests in generalized linear models (GLMs) are often limited to specific cases like logistic regression. More general methods typically require detailed study parameters that are difficult to obtain during planning. We introduce two new effect size measures for estimating power and sample size in studies using Wald tests across any GLM. These measures accommodate any number of predictors or adjusters and require only basic study information. We provide practical guidance for interpreting and applying these measures to approximate a key parameter in power calculations. We also derive asymptotic bounds on the relative error of these approximations, showing that accuracy depends on features of the GLM such as the nonlinearity of the link function. To complement this analysis, we conduct simulation studies across common model specifications, identifying best use cases and opportunities for improvement. Finally, we test the methods in finite samples to confirm their practical utility, using a case study on the relationship between education and receipt of mental health treatment.

Paper Structure

This paper contains 32 sections, 7 theorems, 138 equations, 16 figures, 10 tables.

Key Result

Theorem 1

Fix $\mu_{*}$, $\boldsymbol{\kappa}_*$, and $\boldsymbol{\beta}_*$. Under Assumptions assm:smooth_w, assm:inverse_fcn, and assm:regular with $M$ defined therein,

Figures (16)

  • Figure 1: Simulated distributions of $\eta - \eta_z$ for logistic regression, as we vary parameters affecting $\boldsymbol{\beta}'\mathbf{X}$ ($=c_2 B_x$): standard deviation $s_x$ and shape parameters $a_x$ and $b_x$. Unless noted, we fix $a_x = b_x = a_z = b_z = 1$, $s_x = s_z = 0.2$, $\rho = 0$, and $g^{-1}(\iota) = 0.25$.
  • Figure 2: Relative error for logistic regression, plotted against $\phi_{x|z}$ (top panels) and $R^2_{x|z}$ (bottom panels). Left and right panels correspond to two levels of $s_z^2$. Within each panel, we vary $a_x$ and $b_x$ over all combinations of values in $\{0.5, 1, 1.5\}$. Each point reflects a value of $s_x^2$, evenly spaced from $0.01$ to $0.09$. Other parameters are fixed: $a_z = b_z = 1$, $\rho = 0$, and $g^{-1}(\iota) = .25$.
  • Figure 3: Relative error $\text{re}_\phi$ for a Bernoulli distribution and identity link, plotted against $\phi_{x|z}$; $\text{re}_R$ is not shown because it is zero. Left and right panels correspond to two levels of $s_z^2$. Within each panel, we vary $a_x$ and $b_x$ over all combinations of values in $\{0.5, 1, 1.5\}$. Each point reflects a value of $s_x^2$, evenly spaced from $0.0002$ to $0.0018$. Other parameters are fixed: $a_z = b_z = 1$, $\rho = 0$, and $g^{-1}(\iota) = .25$.
  • Figure 4: Relative error for a Poisson distribution with a log link, plotted against $\phi_{x|z}$ (top panels) and $R^2_{x|z}$ (bottom panels). Left and right panels correspond to two levels of $s_z^2$. Within each panel, we vary $a_x$ and $b_x$ over all combinations of values in $\{0.5, 1, 1.5\}$. Each point reflects a value of $s_x^2$, evenly spaced from $0.002$ to $0.018$. Other parameters are fixed: $a_z = b_z = 1$, $\rho = 0$, and $g^{-1}(\iota) = 1$.
  • Figure 5: Relative error $\text{re}_R$ for a Gamma model with a log link, plotted against $R^2_{x|z}$; $\text{re}_\phi$ is not shown because it is zero. Left and right panels correspond to two levels of $s_z^2$. Within each panel, we vary $a_x$ and $b_x$ over all combinations of values in $\{0.5, 1, 1.5\}$. Each point reflects a value of $s_x^2$, evenly spaced from $0.001$ to $0.009$. Other parameters are fixed: $a_z = b_z = 1$, $\rho = 0$, $g^{-1}(\iota) = 4$; shape parameter is $2$.
  • ...and 11 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Lemma S1
  • Lemma S2
  • Lemma S3