Table of Contents
Fetching ...

Representing Neural Network Layers as Linear Operations via Koopman Operator Theory

Nishant Suresh Aswani, Saif Eddin Jabari, Muhammad Shafique

TL;DR

This work reframes neural networks as dynamical systems and employs Koopman operator theory to linearize individual layers via delay-coordinate observables, using $\\psi(\\mathbf{x}_{k+1}) = \\mathcal{K}\\psi(\\mathbf{x}_k)$ as a core relation. By inserting a layer-scaling block to generate trajectory data and applying Hankelization with dynamic mode decomposition, each nonlinear layer can be replaced by a finite-dimensional linear operator, creating Koopman hybrid networks. The study validates the approach on Yin-Yang and MNIST, showing substantial retention of performance with targeted layer replacements and providing insights into eigenstructure and observable design. This framework offers a data-driven, interpretable avenue to analyze and potentially edit trained network layers in the Koopman space, with practical implications for model simplification and control.

Abstract

The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.

Representing Neural Network Layers as Linear Operations via Koopman Operator Theory

TL;DR

This work reframes neural networks as dynamical systems and employs Koopman operator theory to linearize individual layers via delay-coordinate observables, using as a core relation. By inserting a layer-scaling block to generate trajectory data and applying Hankelization with dynamic mode decomposition, each nonlinear layer can be replaced by a finite-dimensional linear operator, creating Koopman hybrid networks. The study validates the approach on Yin-Yang and MNIST, showing substantial retention of performance with targeted layer replacements and providing insights into eigenstructure and observable design. This framework offers a data-driven, interpretable avenue to analyze and potentially edit trained network layers in the Koopman space, with practical implications for model simplification and control.

Abstract

The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.
Paper Structure (25 sections, 9 equations, 6 figures, 6 tables)

This paper contains 25 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparing our Koopman hybrid approach to a standard model. (Left) A typical MLP with compositions of Linear (blue) + ReLU (green) layers; (Right) Our proposed layer linearization approach, which includes scaling the layer (hatched boxes) and "lifting" the activations into the "Koopman space" via delay coordinates embedding (yellow and orange), consequently replacing the original layer to obtain a Koopman hybrid model.
  • Figure 2: A sample trajectory with 8 states (S1-8) from an original (orange) and scaled (greened) $8 \times 6$ Linear + ReLU layer in an MLP trained on the Yin-Yang dataset. States S7,8 are augmented with $-1$ on the output to allow for a trajectory of system states with uniform dimensionality.
  • Figure 3: Decision boundaries of the original MLP, and its hybrid variants, trained on the Yin-Yang dataset. We test the models on $1000$ samples of the dataset. (A) The original model achieves an accuracy of $98.4\%$. (B) The MLP where the first hidden layer of size $8 \times 6$ is replaced with a DMD model, achieves an accuracy of $91.3\%$. (C) MLP with second $6 \times 4$ hidden layer replaced, achieves an accuracy of $75.8\%$. (D) MLP with final $4 \times 3$ hidden layer replaced, achieves an accuracy of $97.3\%$.
  • Figure 4: Eigenvalue plots from varying DMD hyperparameters when replacing the first hidden layer in the MNIST network. (A-D) $r = 50$ with $h \in (1, 10, 50, 100)$, where (A) produces a model with $56.48\%$ accuracy and (B) $83.24\%$ accuracy; (E-H) $r = 500$ with $h \in \{1, 10, 50, 100\}$ where (E) produces a model with $63.53\%$ accuracy and (F) $90.21\%$ accuracy.
  • Figure 5: Right singular vector (RSV) plots of the Hankel matrix with various delay parameters. (Left) the Hankel matrix is generated from the first hidden layer with $h \in (1, 10, 50)$; $h=1$ achieves $66.66\%$ and $h=10$ achieves $92.04\%$ on the test set. (Right) the Hankel matrix is generated for the final hidden layer with $h \in (1, 10, 50)$; $h=1$ achieves $28.22\%$ and $h=10$ achieves $95.80\%$ on the test set.
  • ...and 1 more figures