Lipschitz constant estimation for general neural network architectures using control tools

Patricia Pauli; Dennis Gramlich; Frank Allgöwer

Lipschitz constant estimation for general neural network architectures using control tools

Patricia Pauli, Dennis Gramlich, Frank Allgöwer

TL;DR

This paper interpret neural networks as time-varying dynamical systems, where the $k$-th layer corresponds to the dynamics at time $k$, and uses this interpretation to exploit the series interconnection structure of feedforward neural networks with a dynamic programming recursion.

Abstract

This paper is devoted to the estimation of the Lipschitz constant of general neural network architectures using semidefinite programming. For this purpose, we interpret neural networks as time-varying dynamical systems, where the $k$-th layer corresponds to the dynamics at time $k$. A key novelty with respect to prior work is that we use this interpretation to exploit the series interconnection structure of feedforward neural networks with a dynamic programming recursion. Nonlinearities, such as activation functions and nonlinear pooling layers, are handled with integral quadratic constraints. If the neural network contains signal processing layers (convolutional or state space model layers), we realize them as 1-D/2-D/N-D systems and exploit this structure as well. We distinguish ourselves from related work on Lipschitz constant estimation by more extensive structure exploitation (scalability) and a generalization to a large class of common neural network architectures. To show the versatility and computational advantages of our method, we apply it to different neural network architectures trained on MNIST and CIFAR-10.

Lipschitz constant estimation for general neural network architectures using control tools

TL;DR

This paper interpret neural networks as time-varying dynamical systems, where the

-th layer corresponds to the dynamics at time

, and uses this interpretation to exploit the series interconnection structure of feedforward neural networks with a dynamic programming recursion.

Abstract

-th layer corresponds to the dynamics at time

. A key novelty with respect to prior work is that we use this interpretation to exploit the series interconnection structure of feedforward neural networks with a dynamic programming recursion. Nonlinearities, such as activation functions and nonlinear pooling layers, are handled with integral quadratic constraints. If the neural network contains signal processing layers (convolutional or state space model layers), we realize them as 1-D/2-D/N-D systems and exploit this structure as well. We distinguish ourselves from related work on Lipschitz constant estimation by more extensive structure exploitation (scalability) and a generalization to a large class of common neural network architectures. To show the versatility and computational advantages of our method, we apply it to different neural network architectures trained on MNIST and CIFAR-10.

Paper Structure (28 sections, 16 theorems, 78 equations, 4 figures, 4 tables)

This paper contains 28 sections, 16 theorems, 78 equations, 4 figures, 4 tables.

Introduction
Problem statement and deep neural networks
Layer definitions
State space representations for convolutions
Lipschitz constant estimation
The convolutional layer
The fully connected layer
The activation function layer
The pooling layer
Flattening operations
The state space model layer
Subnetworks
Residual layers and skip connections
Analysis of the conservatism
Experiments
...and 13 more sections

Key Result

Lemma 1

Consider a convolutional layer $\mathcal{C} : \ell_{2e}^{c_{-}}(\mathbb{N}_0^{2}) \to \ell_{2e}^{c}(\mathbb{N}_0^{2})$ with representation eq:conv_2D characterized by the convolution kernel $K$ and the bias $b$. This layer is realized in state space by the matrices where $K[i_1,i_2]\in\mathbb{R}^{c\times c_{-}},~i_1\in[ 0,r_1],~i_2\in[ 0,r_2]$. The state signals $(x_1[ i_1,i_2])_{i_1,i_2 \in \mat

Figures (4)

Figure 1: (a) No padding, (b) same padding, (c) full padding for a $3\times3$ kernel dumoulin2016guide.
Figure 2: Lipschitz bounds $\gamma$ using LipSDP, GLipSDP, LipLT and the matrix product bound (MP) on fully connected NNs with depths $d=\{2,4,8\}$ and hidden layer size $c=32$ (LipSDP/GLipSDP bounds are close to zero for deeper NNs) (left). Computation times for fully connected NNs with depths $d=\{2,4,8,16,32,64\}$ and channel sizes $c=\{16,32,64\}$ for GLipSDP (\ref{['solid']}) and LipSDP (\ref{['dashed']}) (right).
Figure 3: Lipschitz bounds $\gamma$ using CLipSDP, GLipSDP, LipLT and the matrix product bound (MP) on fully convolutional NNs with depths $d=\{2,4,8,16\}$ and channel sizes $c=16$ (left). Computation times for fully convolutional networks with depths $d=\{2,4,8,16\}$ and channel sizes $c=\{8,16,32\}$ for GLipSDP (\ref{['solid']}) and CLipSDP (\ref{['dashed']}) (right).
Figure 4: Computation times for GLipSDP using 32, 16, 8, 4, 2, 1 subnetworks for a 32-layer fully connected network with 32 and 64 neurons (left). Computation times for GLipSDP using 16, 8, 4, 2, 1 subnetworks for a 16-layer fully convolutional network with 8 and 16 channels (right). The resulting Lipschitz bound is the same for all computations.

Theorems & Definitions (34)

Remark 1
Definition 1: Roesser model
Lemma 1: Realization of 2-D convolutions pauli2024state
proof
Remark 2
Remark 3
Lemma 2
proof
Lemma 3
Lemma 4: Slope-restriction fazlyab2023efficientpauli2021training
...and 24 more

Lipschitz constant estimation for general neural network architectures using control tools

TL;DR

Abstract

Lipschitz constant estimation for general neural network architectures using control tools

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (34)