Table of Contents
Fetching ...

Spectral Norm of Convolutional Layers with Circular and Zero Paddings

Blaise Delattre, Quentin Barthélemy, Alexandre Allauzen

TL;DR

The paper develops a generalized Gram iteration framework to compute guaranteed upper bounds on the spectral norm of convolutional layers, extending from circular padding to zero padding using Toeplitz structures and proving quadratic convergence. By exploiting block-diagonalization in the Fourier domain for circular convolutions and structured Gram iterates for Toeplitz forms, it provides tight bounds with practical computational cost. It introduces Spectral Rescaling (SR), a differentiable procedure that yields true 1-Lipschitz layers and bridges the gap between AOL and spectral normalization, improving robustness while maintaining training stability. Empirical results show improved accuracy and scalability of spectral-norm estimation and demonstrate certified robustness gains on CIFAR datasets, with code available for reproducibility.

Abstract

This paper leverages the use of \emph{Gram iteration} an efficient, deterministic, and differentiable method for computing spectral norm with an upper bound guarantee. Designed for circular convolutional layers, we generalize the use of the Gram iteration to zero padding convolutional layers and prove its quadratic convergence. We also provide theorems for bridging the gap between circular and zero padding convolution's spectral norm. We design a \emph{spectral rescaling} that can be used as a competitive $1$-Lipschitz layer that enhances network robustness. Demonstrated through experiments, our method outperforms state-of-the-art techniques in precision, computational cost, and scalability. The code of experiments is available at https://github.com/blaisedelattre/lip4conv.

Spectral Norm of Convolutional Layers with Circular and Zero Paddings

TL;DR

The paper develops a generalized Gram iteration framework to compute guaranteed upper bounds on the spectral norm of convolutional layers, extending from circular padding to zero padding using Toeplitz structures and proving quadratic convergence. By exploiting block-diagonalization in the Fourier domain for circular convolutions and structured Gram iterates for Toeplitz forms, it provides tight bounds with practical computational cost. It introduces Spectral Rescaling (SR), a differentiable procedure that yields true 1-Lipschitz layers and bridges the gap between AOL and spectral normalization, improving robustness while maintaining training stability. Empirical results show improved accuracy and scalability of spectral-norm estimation and demonstrate certified robustness gains on CIFAR datasets, with code available for reproducibility.

Abstract

This paper leverages the use of \emph{Gram iteration} an efficient, deterministic, and differentiable method for computing spectral norm with an upper bound guarantee. Designed for circular convolutional layers, we generalize the use of the Gram iteration to zero padding convolutional layers and prove its quadratic convergence. We also provide theorems for bridging the gap between circular and zero padding convolution's spectral norm. We design a \emph{spectral rescaling} that can be used as a competitive -Lipschitz layer that enhances network robustness. Demonstrated through experiments, our method outperforms state-of-the-art techniques in precision, computational cost, and scalability. The code of experiments is available at https://github.com/blaisedelattre/lip4conv.
Paper Structure (28 sections, 11 theorems, 41 equations, 2 figures, 2 tables, 3 algorithms)

This paper contains 28 sections, 11 theorems, 41 equations, 2 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Let $K \in \mathbb{R}^{n \times n}$ a convolutional kernel and $C \in \mathbb{R}^{n^2 \times n^2}$ be the doubly-block circulant matrix such that $C = \mathrm{blkcirc}\left(K_1, \dots, K_n\right)$, then, $C$ can be diagonalized as follows: where $\lambda = F \mathrm{vec}\xspace(K)$ are the eigenvalues of $C$.

Figures (2)

  • Figure 1: Evolution of bound factor $(1/(1 - \alpha))^{2^{-t}}$, described in Theorem \ref{['thm:bound_approximation_for_lower_input_size']}, for input size $n=224$, kernel size $k=3$ and number of Gram iterations $t \in \{1, 3, 5, 6 \}$.
  • Figure 2: Estimation difference and computational times for spectral norm computation for zeros padded convolutional layers varying number of channels ${c_{\mathrm{in}}}, {c_{\mathrm{out}}}$, comparing different methods. Kernel size is $3$, input size is $32$.

Theorems & Definitions (19)

  • Theorem 1: Section 5.5 of jain1989fundamentals
  • Theorem 2: Corollary A.1.1. of trockman2021orthogonalizing
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Corollary 1
  • Definition 1
  • Lemma 1
  • ...and 9 more