Table of Contents
Fetching ...

Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families

Benedikt Lütke Schwienhorst, Lucas Kock, Nadja Klein, David J. Nott

TL;DR

This paper studies dropout regularization in extended GLMs based on double exponential families, where both mean and dispersion can vary with covariates.It shows that dropout induces an L2-type penalty whose strength is shaped by a Fisher-information-like structure, favoring rare but informative features and introducing an asymmetry between overdispersion and underdispersion.The authors extend the theory to nonparametric estimation with B-splines and compare dropout to penalized maximum likelihood, finding that dropout is particularly advantageous when important features are rare.Empirically, dropout improves adaptive smoothing with B-splines and provides improved fit on Berlin traffic data, while the results also indicate conditions under which PMLE may be preferable.

Abstract

Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modeled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty. An application to traffic detection data from Berlin further illustrates the benefits of our method.

Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families

TL;DR

This paper studies dropout regularization in extended GLMs based on double exponential families, where both mean and dispersion can vary with covariates.It shows that dropout induces an L2-type penalty whose strength is shaped by a Fisher-information-like structure, favoring rare but informative features and introducing an asymmetry between overdispersion and underdispersion.The authors extend the theory to nonparametric estimation with B-splines and compare dropout to penalized maximum likelihood, finding that dropout is particularly advantageous when important features are rare.Empirically, dropout improves adaptive smoothing with B-splines and provides improved fit on Berlin traffic data, while the results also indicate conditions under which PMLE may be preferable.

Abstract

Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modeled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty. An application to traffic detection data from Berlin further illustrates the benefits of our method.
Paper Structure (26 sections, 47 equations, 11 figures)

This paper contains 26 sections, 47 equations, 11 figures.

Figures (11)

  • Figure 1: Testfunctions for the mean and dispersion.
  • Figure 2: Boxplots of RMSEs for dispersion parameters of Gaussian data (left column), Poisson data (middle column) and binomial data (right column) across Scenarios 1 (top row), 2 (middle row) and 3 (bottom row). Outliers in the dispersion were cut at the $95$-th percentile.
  • Figure 3: Estimated dispersion effects in the (a) Gaussian model and (b) binomial model for Bernoulli dropout (left), Gaussian dropout (middle) and PMLE (right) in Scenario 1 (upper row), Scenario 2 (middle row) and Scenario 3 (bottom row) for $R=100$ replicates and $n=1,000$. The true effects are given by the black lines.
  • Figure 4: Traffic detection data with the (a) positioning of the four sensors in the (b) West, (c) South, (d) East and (e) North of the Berlin city center. For (b)--(e) the upper panels depict the counts ($y$-axis) for inbound traffic for each hour from 0am (=0) to 11pm (=23) ($x$-axis). The bottom panels show the corresponding outbound traffic.
  • Figure 5: Cross-validated estimates for the traffic detection data of the four sensors in the ($\color{red}\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) West, ($\color{magenta}\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) South, ($\color{blue}\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) East and ($\color{green}\mathbin{\vcenter{\hbox{$\m@th\bullet$}}}$) North of the Berlin city center.
  • ...and 6 more figures