Table of Contents
Fetching ...

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba

TL;DR

It is demonstrated that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation.

Abstract

Controlling the spectral norm of the Jacobian matrix, which is related to the convolution operation, has been shown to improve generalization, training stability and robustness in CNNs. Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes. In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation. This new upper bound is independent of the input image resolution, differentiable and can be efficiently calculated during training. Through experiments, we demonstrate how this new bound can be used to improve the performance of convolutional architectures.

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

TL;DR

It is demonstrated that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation.

Abstract

Controlling the spectral norm of the Jacobian matrix, which is related to the convolution operation, has been shown to improve generalization, training stability and robustness in CNNs. Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes. In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation. This new upper bound is independent of the input image resolution, differentiable and can be efficiently calculated during training. Through experiments, we demonstrate how this new bound can be used to improve the performance of convolutional architectures.
Paper Structure (23 sections, 6 theorems, 55 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 6 theorems, 55 equations, 5 figures, 8 tables, 1 algorithm.

Key Result

lemma thmcounterlemma

$\|K\|_\sigma \leq \|R\|_2$ for any unfolding matrix $R$ of the tensor $K$.

Figures (5)

  • Figure 1: The behaviour of spectral norm bounds for the third layer of ResNet-18 during training on CIFAR100. The left figure compares tightness of $TN$ and $F4$ bounds when training without regularization. The middle figure shows the effect of training with and without $TN$ regularization. The right one demonstrates the influence of regularization on the spectral norm of composition of convolution and the subsequent BatchNorm layer. Similar plots for all layers are presented in Figures \ref{['sup:fig:1']}, \ref{['sup:fig:2']}, \ref{['sup:fig:3']} in Appendix.
  • Figure 2: The plot compares our $TN$ bound with the $F4$ bound for convolutional layers of ResNet18 trained on CIFAR100. We do not use any regularization or weight decay in this experiment.
  • Figure 3: Effect of regularization with $TN$ bound on the spectral norm of convolutional layers of ResNet18 trained on CIFAR100.
  • Figure 4: The behaviour of the spectral norm of composition of convolution and subsequent BatchNorm layers for ResNet18 trained on CIFAR100 with and without $TN$ regularization.
  • Figure 5: Comparison of existing methods in terms of memory consumption, time efficiency and precision for convolution with zero padding and kernels with entries sampled from $\mathcal{N}(0, 1)$. We measure the precision as $|\sigma_{method}-\sigma_{ref}| / \sigma_{ref}$, where $\sigma_{ref}$ is a highly accurate reference value obtained using the power method. We do not plot precision of PowerQR ebrahimpour2023spectrum as it gives the exact value. The power method and PowerQR ryu2019plugebrahimpour2023spectrum are accurate, but their time complexity noticeably depends on $n$ and $c_{out}$. LipBound araujo2021lipschitz produces errors larger than the other methods. Gram iteration delattre2023efficient is fast, but consumes as much memory as the method by Sedghi et al.sedghi2018singular and is inapplicable for large $c_{out}, c_{in}$ and $n$. Our method is memory efficient and provides a trade-off between speed and accuracy, improving the Fantastic four bound singla2019fantastic.

Theorems & Definitions (13)

  • lemma thmcounterlemma: Prop. 4.1, wang2017operator
  • lemma thmcounterlemma: Lemma 4, yi2020asymptotic
  • theorem thmcountertheorem
  • proof
  • remark thmcounterremark
  • remark thmcounterremark
  • theorem thmcountertheorem
  • proof
  • theorem thmcountertheorem
  • proof
  • ...and 3 more