Table of Contents
Fetching ...

Statistical Inference in High-dimensional Poisson Regression with Applications to Mediation Analysis

Prabrisha Rakshit, Zijian Guo

Abstract

Large-scale datasets with count outcome variables are widely present in various applications, and the Poisson regression model is among the most popular models for handling count outcomes. This paper considers the high-dimensional sparse Poisson regression model and proposes bias-corrected estimators for both linear and quadratic transformations of high-dimensional regression vectors. We establish the asymptotic normality of the estimators, construct asymptotically valid confidence intervals, and conduct related hypothesis testing. We apply the devised methodology to high-dimensional mediation analysis with count outcome, with particular application of testing for the existence of interaction between the treatment variable and high-dimensional mediators. We demonstrate the proposed methods through extensive simulation studies and application to real-world epigenetic data.

Statistical Inference in High-dimensional Poisson Regression with Applications to Mediation Analysis

Abstract

Large-scale datasets with count outcome variables are widely present in various applications, and the Poisson regression model is among the most popular models for handling count outcomes. This paper considers the high-dimensional sparse Poisson regression model and proposes bias-corrected estimators for both linear and quadratic transformations of high-dimensional regression vectors. We establish the asymptotic normality of the estimators, construct asymptotically valid confidence intervals, and conduct related hypothesis testing. We apply the devised methodology to high-dimensional mediation analysis with count outcome, with particular application of testing for the existence of interaction between the treatment variable and high-dimensional mediators. We demonstrate the proposed methods through extensive simulation studies and application to real-world epigenetic data.

Paper Structure

This paper contains 25 sections, 18 theorems, 120 equations, 4 tables.

Key Result

Proposition 1

Suppose that Condition (C1) holds. Then, if we choose $\lambda_0 := \left\|\frac{1}{n}\sum_{i=1}^{n}\epsilon_iX_{i\cdot}\right\|_{\infty} \asymp \sqrt{\frac{\log p}{n}}$ and assume that $\max_{i,j}\left|X_{ij}\right|k \lambda_0 \leq c$ for some constant $c>0$, then with probability greater than $1- where $S$ denotes the support of $\beta$ and $C>0$ is a positive constant.

Theorems & Definitions (18)

  • Proposition 1
  • Lemma 1
  • Theorem 1
  • Proposition 2
  • Theorem 2
  • Proposition 3
  • Proposition 4
  • Theorem 3
  • Lemma 2
  • Lemma 3
  • ...and 8 more