Implicit Bias of Gradient Descent on Linear Convolutional Networks

Suriya Gunasekar; Jason Lee; Daniel Soudry; Nathan Srebro

Implicit Bias of Gradient Descent on Linear Convolutional Networks

Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

TL;DR

This paper analyzes the implicit bias induced by gradient descent when training over-parameterized linear networks. It shows a sharp contrast: fully connected networks (any depth) converge to the hard-margin SVM direction, while linear convolutional networks bias toward frequency-domain sparsity, with the depth determining the bridge norm $\\|\\widehat{\\boldsymbol{\beta}}\\|_{2/L}$. The authors provide a unified framework linking parameter-space homogeneity to predictor-space regularizers, deriving explicit forms for the induced penalties in both the time and Fourier domains. The work highlights a fundamental inductive bias arising solely from convolutional parameterization, suggesting broader implications for generalization and the design of optimization strategies in deep linear models.

Abstract

We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.

Implicit Bias of Gradient Descent on Linear Convolutional Networks

TL;DR

. The authors provide a unified framework linking parameter-space homogeneity to predictor-space regularizers, deriving explicit forms for the induced penalties in both the time and Fourier domains. The work highlights a fundamental inductive bias arising solely from convolutional parameterization, suggesting broader implications for generalization and the design of optimization strategies in deep linear models.

Abstract

We show that gradient descent on full-width linear convolutional networks of depth

converges to a linear predictor related to the

bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.

Implicit Bias of Gradient Descent on Linear Convolutional Networks

TL;DR

Abstract

Implicit Bias of Gradient Descent on Linear Convolutional Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (33)