Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

Ekaterina Grishina; Mikhail Gorbunov; Maxim Rakhuba

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba

TL;DR

It is demonstrated that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation.

Abstract

Controlling the spectral norm of the Jacobian matrix, which is related to the convolution operation, has been shown to improve generalization, training stability and robustness in CNNs. Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes. In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation. This new upper bound is independent of the input image resolution, differentiable and can be efficiently calculated during training. Through experiments, we demonstrate how this new bound can be used to improve the performance of convolutional architectures.

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

TL;DR

Abstract

Paper Structure (23 sections, 6 theorems, 55 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 6 theorems, 55 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Related work
Preliminaries and notation
Background on convolutions
Matrix and tensor norms
Tensor unfoldings
Spectral density matrix
Main results
Computation of the spectral norm
Experiments
Spectral norm computation
Spectral norm regularization
Orthogonal regularization
Conclusion
Strided convolution
...and 8 more sections

Key Result

lemma thmcounterlemma

$\|K\|_\sigma \leq \|R\|_2$ for any unfolding matrix $R$ of the tensor $K$.

Figures (5)

Figure 1: The behaviour of spectral norm bounds for the third layer of ResNet-18 during training on CIFAR100. The left figure compares tightness of $TN$ and $F4$ bounds when training without regularization. The middle figure shows the effect of training with and without $TN$ regularization. The right one demonstrates the influence of regularization on the spectral norm of composition of convolution and the subsequent BatchNorm layer. Similar plots for all layers are presented in Figures \ref{['sup:fig:1']}, \ref{['sup:fig:2']}, \ref{['sup:fig:3']} in Appendix.
Figure 2: The plot compares our $TN$ bound with the $F4$ bound for convolutional layers of ResNet18 trained on CIFAR100. We do not use any regularization or weight decay in this experiment.
Figure 3: Effect of regularization with $TN$ bound on the spectral norm of convolutional layers of ResNet18 trained on CIFAR100.
Figure 4: The behaviour of the spectral norm of composition of convolution and subsequent BatchNorm layers for ResNet18 trained on CIFAR100 with and without $TN$ regularization.
Figure 5: Comparison of existing methods in terms of memory consumption, time efficiency and precision for convolution with zero padding and kernels with entries sampled from $\mathcal{N}(0, 1)$. We measure the precision as $|\sigma_{method}-\sigma_{ref}| / \sigma_{ref}$, where $\sigma_{ref}$ is a highly accurate reference value obtained using the power method. We do not plot precision of PowerQR ebrahimpour2023spectrum as it gives the exact value. The power method and PowerQR ryu2019plugebrahimpour2023spectrum are accurate, but their time complexity noticeably depends on $n$ and $c_{out}$. LipBound araujo2021lipschitz produces errors larger than the other methods. Gram iteration delattre2023efficient is fast, but consumes as much memory as the method by Sedghi et al.sedghi2018singular and is inapplicable for large $c_{out}, c_{in}$ and $n$. Our method is memory efficient and provides a trade-off between speed and accuracy, improving the Fantastic four bound singla2019fantastic.

Theorems & Definitions (13)

lemma thmcounterlemma: Prop. 4.1, wang2017operator
lemma thmcounterlemma: Lemma 4, yi2020asymptotic
theorem thmcountertheorem
proof
remark thmcounterremark
remark thmcounterremark
theorem thmcountertheorem
proof
theorem thmcountertheorem
proof
...and 3 more

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

TL;DR

Abstract

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (13)