Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro
TL;DR
This paper analyzes the implicit bias induced by gradient descent when training over-parameterized linear networks. It shows a sharp contrast: fully connected networks (any depth) converge to the hard-margin SVM direction, while linear convolutional networks bias toward frequency-domain sparsity, with the depth determining the bridge norm $\\|\\widehat{\\boldsymbol{\beta}}\\|_{2/L}$. The authors provide a unified framework linking parameter-space homogeneity to predictor-space regularizers, deriving explicit forms for the induced penalties in both the time and Fourier domains. The work highlights a fundamental inductive bias arising solely from convolutional parameterization, suggesting broader implications for generalization and the design of optimization strategies in deep linear models.
Abstract
We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.
