State space representations of the Roesser type for convolutional layers
Patricia Pauli, Dennis Gramlich, Frank Allgöwer
TL;DR
This work addresses the challenge of applying control-theoretic tools to convolutional layers by providing a Roesser-type state-space realization for 2-D convolutional layers with memory $n_1+n_2=c_{out} r_1 + c_{in} r_2$, and proves minimality in the important case $c_{in}=c_{out}$. The authors explicitly construct the realization from the convolution kernel, show transfer-function equivalence to the original FIR convolution, and extend the approach to N-D, dilated, and strided convolutions. The main contributions are a minimal, scalable 2-D FIR realization, a generalization to higher dimensions and convolution variants, and a framework that enables efficient LMI-based analysis and synthesis for CNNs. This state-space formulation offers a compact, readily usable alternative to fully-connected reformulations, improving scalability for safety-critical CNN analysis and design, with future work aimed at broader channel sizes and Lipschitz-based guarantees.
Abstract
From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.
