Table of Contents
Fetching ...

Variational Deep Learning via Implicit Regularization

Jonathan Wenger, Beau Coker, Juraj Marusic, John P. Cunningham

Abstract

Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters, and optimization procedure. However, deep neural networks can be surprisingly non-robust, resulting in overconfident predictions and poor out-of-distribution generalization. Bayesian deep learning addresses this via model averaging, but typically requires significant computational resources as well as carefully elicited priors to avoid overriding the benefits of implicit regularization. Instead, in this work, we propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent. We theoretically characterize this inductive bias in overparametrized linear models as generalized variational inference and demonstrate the importance of the choice of parametrization. Empirically, our approach demonstrates strong in- and out-of-distribution performance without additional hyperparameter tuning and with minimal computational overhead.

Variational Deep Learning via Implicit Regularization

Abstract

Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters, and optimization procedure. However, deep neural networks can be surprisingly non-robust, resulting in overconfident predictions and poor out-of-distribution generalization. Bayesian deep learning addresses this via model averaging, but typically requires significant computational resources as well as carefully elicited priors to avoid overriding the benefits of implicit regularization. Instead, in this work, we propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent. We theoretically characterize this inductive bias in overparametrized linear models as generalized variational inference and demonstrate the importance of the choice of parametrization. Empirically, our approach demonstrates strong in- and out-of-distribution performance without additional hyperparameter tuning and with minimal computational overhead.

Paper Structure

This paper contains 8 sections, 4 equations, 1 figure.

Figures (1)

  • Figure 1: Variational deep learning via implicit regularization. Neural networks generalize well without explicit regularization due to implicit regularization from the architecture and optimization. We can exploit this implicit bias for variational deep learning, removing the computational overhead of explicit regularization and narrowing the gap to deep learning practice. As illustrated for a two-hidden layer MLP and proven rigorously for overparametrized linear models in \ref{['thm:implicit-bias-vi-overparametrized-regression', 'thm:classification-implicit-bias']}, the implicit bias of (S)GD in variational networks (see \ref{['subfig:fig1-implicit-regularization']}) can be understood as generalized variational inference with a 2-Wasserstein regularizer (see \ref{['subfig:fig1-explicit-regularization']}). This differs from the standard ELBO objective with a KL divergence to the prior as used for example in mean-field VI (see \ref{['subfig:fig1-explicit-regularization']}).