Table of Contents
Fetching ...

The Singular Values of Convolutional Layers

Hanie Sedghi, Vineet Gupta, Philip M. Long

TL;DR

This work presents an exact, efficient characterization of the singular values of 2D multi-channel convolutional layers by leveraging the Fourier structure of convolution and circulant matrices. It shows how to compute the full spectrum in $O(n^2 m^2 (m + \log n))$ time via FFTs and SVDs, enabling projection onto an operator-norm ball as a regularizer. The authors demonstrate that operator-norm regularization improves CIFAR-10 performance when used with ResNet architectures, and that it complements batch normalization rather than replacing it. The approach outperforms prior heuristic reshaping methods both in accuracy and computational efficiency, making spectrum-aware regularization practical for deep networks.

Abstract

We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2\% to 5.3\%.

The Singular Values of Convolutional Layers

TL;DR

This work presents an exact, efficient characterization of the singular values of 2D multi-channel convolutional layers by leveraging the Fourier structure of convolution and circulant matrices. It shows how to compute the full spectrum in time via FFTs and SVDs, enabling projection onto an operator-norm ball as a regularizer. The authors demonstrate that operator-norm regularization improves CIFAR-10 performance when used with ResNet architectures, and that it complements batch normalization rather than replacing it. The approach outperforms prior heuristic reshaping methods both in accuracy and computational efficiency, making spectrum-aware regularization practical for deep networks.

Abstract

We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2\% to 5.3\%.

Paper Structure

This paper contains 17 sections, 9 theorems, 17 equations, 7 figures.

Key Result

Lemma 1

For any filter coefficients $K$, the linear transform for the convolution by $K$ is represented by the following doubly block circulant matrix: That is, if $X$ is an $n \times n$ matrix, and $Y$ is the result of a 2-d convolution of $X$ with $K$, i.e. then ${\mathrm{vec}}(Y) = A ~{\mathrm{vec}}(X)$.

Figures (7)

  • Figure 1: Time used to compute singular values. The left graph is for a $3 \times 3$ convolution on a $16 \times 16$ image with the number of input/output channels on the $x$-axis. The right graph is for a $11 \times 11$ convolution on a $64 \times 64$ image (no curve for full matrix method is shown as this method could not complete in a reasonable time for these inputs).
  • Figure 2: Training loss and test error for ResNet model he2016identity for CIFAR-10.
  • Figure 3: A scatterplot of the test errors obtained with different hyperparameter combinations, and different operator-norm regularizers.
  • Figure 4: Test error vs. training time for ResNet model he2016identity for CIFAR-10.
  • Figure 5: Plot of the singular values of the linear operators associated with the convolutional layers of the pretrained "ResNet V2" from the TensorFlow website.
  • ...and 2 more figures

Theorems & Definitions (9)

  • Lemma 1: see jain1989fundamentals Section 5.5, goodfellow2016deep page 329
  • Theorem 2: jain1989fundamentals Section 5.5
  • Lemma 3: jain1989fundamentals Section 5.5
  • Lemma 4
  • Theorem 5
  • Theorem 6
  • Lemma 7
  • Theorem 8
  • Proposition 9: lefkimmiatis2013hessian, Proposition 1