Table of Contents
Fetching ...

Poisson Regression in one Covariate on Massive Data

Torsten Reuter, Rainer Schwabe

TL;DR

To show the advantage of the optimal subsampling designs, the efficiency of uniform random subsampling as well as of two heuristic designs are examined, and the efficiency of locally-optimal subsampling designs is studied when the parameter is misspecified.

Abstract

The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally $ D $-optimal subsampling designs under a Poisson regression model with a log link in one covariate. A Representation of the support of locally $ D $-optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally $ D $-optimal subsampling designs is studied when the parameter is misspecified.

Poisson Regression in one Covariate on Massive Data

TL;DR

To show the advantage of the optimal subsampling designs, the efficiency of uniform random subsampling as well as of two heuristic designs are examined, and the efficiency of locally-optimal subsampling designs is studied when the parameter is misspecified.

Abstract

The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally -optimal subsampling designs under a Poisson regression model with a log link in one covariate. A Representation of the support of locally -optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally -optimal subsampling designs is studied when the parameter is misspecified.
Paper Structure (6 sections, 5 theorems, 11 equations, 6 figures, 3 tables)

This paper contains 6 sections, 5 theorems, 11 equations, 6 figures, 3 tables.

Key Result

Theorem 3.1

Let assumptions (A1) and (A2) be satisfied and let $\beta_{1} < 0$. Then the subsampling design $\xi^*$ is locally $D$-optimal at $\boldsymbol{\beta}$ if and only if $\xi^*$ has density $f_{\xi^*}(x) = f_X(x) \mathds{1}_{\mathcal{X}^*}(x)$ and either

Figures (6)

  • Figure 1: Density of the locally optimal design (solid) at $\beta_{1}$ and the standard exponential distribution (dashed, upper panels), and corresponding sensitivity functions (lower panels) for $\beta_{1} = - 4$, $\alpha = 0.75$ (left) and $\beta_{1} = - 1$, $\alpha = 0.3$ (right)
  • Figure 2: Density of the locally optimal design (solid) at $\beta_{1}$ for a uniformly distributed covariate on $[0,1]$ (dashed, upper panels), and sensitivity functions (lower panels) for $\beta_{1} = - 8$, $\alpha = 0.5$ (left) and $\beta_{1} = - 4$, $\alpha = 0.1$ (right)
  • Figure 3: Density of the locally optimal design (solid) at $\beta_{1}$ for a uniformly distributed covariate on $[0,1]$ (dashed, upper panel), and sensitivity functions (lower panel) for $\beta_{1} = - 2$, $\alpha = 0.3$
  • Figure 4: $D$-efficiency of uniform random subsampling (solid), one-sided (dashed), and two-sided (dot-dashed) subsampling design in dependence on the subsampling proportion $\alpha$ for slope-rate ratio $\beta_{1} / \lambda = - 1$ (left) and $- 4$ (right) for an exponentially distributed covariate
  • Figure 5: $D$-efficiency of uniform random subsampling (solid), one-sided (dashed), and two-sided (dot-dashed) subsampling design in dependence on the slope-rate ratio $\beta_{1} / \lambda$ for subsampling proportion $\alpha = 0.1$ and an exponentially distributed covariate
  • ...and 1 more figures

Theorems & Definitions (16)

  • Remark 3.1
  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Theorem 3.2
  • Corollary 3.3
  • Example 3.1: exponential distribution
  • Example 3.2: uniform distribution
  • Example 4.1: exponential distribution
  • ...and 6 more