Representing Neural Network Layers as Linear Operations via Koopman Operator Theory
Nishant Suresh Aswani, Saif Eddin Jabari, Muhammad Shafique
TL;DR
This work reframes neural networks as dynamical systems and employs Koopman operator theory to linearize individual layers via delay-coordinate observables, using $\\psi(\\mathbf{x}_{k+1}) = \\mathcal{K}\\psi(\\mathbf{x}_k)$ as a core relation. By inserting a layer-scaling block to generate trajectory data and applying Hankelization with dynamic mode decomposition, each nonlinear layer can be replaced by a finite-dimensional linear operator, creating Koopman hybrid networks. The study validates the approach on Yin-Yang and MNIST, showing substantial retention of performance with targeted layer replacements and providing insights into eigenstructure and observable design. This framework offers a data-driven, interpretable avenue to analyze and potentially edit trained network layers in the Koopman space, with practical implications for model simplification and control.
Abstract
The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.
