State space representations of the Roesser type for convolutional layers

Patricia Pauli; Dennis Gramlich; Frank Allgöwer

State space representations of the Roesser type for convolutional layers

Patricia Pauli, Dennis Gramlich, Frank Allgöwer

TL;DR

This work addresses the challenge of applying control-theoretic tools to convolutional layers by providing a Roesser-type state-space realization for 2-D convolutional layers with memory $n_1+n_2=c_{out} r_1 + c_{in} r_2$, and proves minimality in the important case $c_{in}=c_{out}$. The authors explicitly construct the realization from the convolution kernel, show transfer-function equivalence to the original FIR convolution, and extend the approach to N-D, dilated, and strided convolutions. The main contributions are a minimal, scalable 2-D FIR realization, a generalization to higher dimensions and convolution variants, and a framework that enables efficient LMI-based analysis and synthesis for CNNs. This state-space formulation offers a compact, readily usable alternative to fully-connected reformulations, improving scalability for safety-critical CNN analysis and design, with future work aimed at broader channel sizes and Lipschitz-based guarantees.

Abstract

From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.

State space representations of the Roesser type for convolutional layers

TL;DR

This work addresses the challenge of applying control-theoretic tools to convolutional layers by providing a Roesser-type state-space realization for 2-D convolutional layers with memory

, and proves minimality in the important case

. The authors explicitly construct the realization from the convolution kernel, show transfer-function equivalence to the original FIR convolution, and extend the approach to N-D, dilated, and strided convolutions. The main contributions are a minimal, scalable 2-D FIR realization, a generalization to higher dimensions and convolution variants, and a framework that enables efficient LMI-based analysis and synthesis for CNNs. This state-space formulation offers a compact, readily usable alternative to fully-connected reformulations, improving scalability for safety-critical CNN analysis and design, with future work aimed at broader channel sizes and Lipschitz-based guarantees.

Abstract

states, where

is the number of input/output channels of the layer and

characterizes the width/length of the convolution kernel. This representation is shown to be minimal for

. We further construct state space representations for dilated, strided, and N-D convolutions.

Paper Structure (11 sections, 1 theorem, 34 equations, 1 figure)

This paper contains 11 sections, 1 theorem, 34 equations, 1 figure.

Introduction and main result
Related work and background
Problem statement
Proofs for Theorem \ref{['thm:main1']} and Theorem \ref{['thm:main2']}
General convolutions and examples
1-D convolutions
2-D convolutions
N-D convolutions
Dilated convolutions
Strided convolutions
Conclusion

Key Result

Corollary 1

The mapping eq:conv can be represented as eq:RoesserSys, where the matrices $A_{11},\ldots,D$ are given by where $K[\boldsymbol{i} ]\in\mathbb{R}^{c_\mathrm{out}\times c_\mathrm{in}},~\boldsymbol{i} \in[ 0,\boldsymbol{r} ]$ and the kernel size is $(r_1+1)\times (r_2+1)\times\cdots\times (r_d+1)$. The state, input, and output dimensions are $n_{1}=c_{\mathrm{out}}r_1, n_{2}=c_{\mathrm{in}}r_d\cdot

Figures (1)

Figure 1: Visualization of state space representations for $3\times 3$ kernel with stride $\boldsymbol{s}=1$ (left) and $\boldsymbol{s}=2$ (right). At propagation step $i_1,i_2$, the blue pixel input enters through $D$, the purple pixel input enters through $C_2$, the red pixel input enters through $B_1$, and the yellow pixel input enters through $A_{12}$.

Theorems & Definitions (8)

Definition 1: Roesser model
Example 1: Standard convolution
Example 2: $2\times 3$ kernel
Example 3: $3\times 2$ kernel
Corollary 1
Example 4: 3-D convolution
Example 5: Dilated convolution
Example 6: Strided convolution

State space representations of the Roesser type for convolutional layers

TL;DR

Abstract

State space representations of the Roesser type for convolutional layers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (8)