Table of Contents
Fetching ...

Extraction of nonlinearity in neural networks with Koopman operator

Naoki Sugishita, Kayo Kinjo, Jun Ohkubo

TL;DR

The paper addresses whether nonlinear activations are indispensable in neural networks by replacing intermediate nonlinear layers with a Koopman operator learned via EDMD. It demonstrates that a finite-dimensional Koopman matrix, aided by tensor-train representations, can mimic internal layer dynamics and maintain competitive accuracy under substantial compression. Key findings include that a modest number of singular values (≈10) capture the essential behavior and that Gaussian RBF dictionaries can yield effective surrogates, with similar results on MNIST and Fashion MNIST. This work advances a physics-inspired, data-driven framework for neural network compression and interpretability through linear representations of nonlinear dynamics.

Abstract

Nonlinearity plays a crucial role in deep neural networks. In this paper, we investigate the degree to which the nonlinearity of the neural network is essential. For this purpose, we employ the Koopman operator, extended dynamic mode decomposition, and the tensor-train format. The Koopman operator approach has been recently developed in physics and nonlinear sciences; the Koopman operator deals with the time evolution in the observable space instead of the state space. Since we can replace the nonlinearity in the state space with the linearity in the observable space, it is a hopeful candidate for understanding complex behavior in nonlinear systems. Here, we analyze learned neural networks for the classification problems. As a result, the replacement of the nonlinear middle layers with the Koopman matrix yields enough accuracy in numerical experiments. In addition, we confirm that the pruning of the Koopman matrix gives sufficient accuracy even at high compression ratios. These results indicate the possibility of extracting some features in the neural networks with the Koopman operator approach.

Extraction of nonlinearity in neural networks with Koopman operator

TL;DR

The paper addresses whether nonlinear activations are indispensable in neural networks by replacing intermediate nonlinear layers with a Koopman operator learned via EDMD. It demonstrates that a finite-dimensional Koopman matrix, aided by tensor-train representations, can mimic internal layer dynamics and maintain competitive accuracy under substantial compression. Key findings include that a modest number of singular values (≈10) capture the essential behavior and that Gaussian RBF dictionaries can yield effective surrogates, with similar results on MNIST and Fashion MNIST. This work advances a physics-inspired, data-driven framework for neural network compression and interpretability through linear representations of nonlinear dynamics.

Abstract

Nonlinearity plays a crucial role in deep neural networks. In this paper, we investigate the degree to which the nonlinearity of the neural network is essential. For this purpose, we employ the Koopman operator, extended dynamic mode decomposition, and the tensor-train format. The Koopman operator approach has been recently developed in physics and nonlinear sciences; the Koopman operator deals with the time evolution in the observable space instead of the state space. Since we can replace the nonlinearity in the state space with the linearity in the observable space, it is a hopeful candidate for understanding complex behavior in nonlinear systems. Here, we analyze learned neural networks for the classification problems. As a result, the replacement of the nonlinear middle layers with the Koopman matrix yields enough accuracy in numerical experiments. In addition, we confirm that the pruning of the Koopman matrix gives sufficient accuracy even at high compression ratios. These results indicate the possibility of extracting some features in the neural networks with the Koopman operator approach.
Paper Structure (14 sections, 14 equations, 14 figures, 3 tables)

This paper contains 14 sections, 14 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Neural network, discrete-time dynamical system, and the partial replacement with the Koopman matrix.
  • Figure 2: Prediction errors for $100$ test points. Error bars correspond to the standard deviation for $5$ different experiments. The horizontal axis is the maximum degree for each variable, $N_{\mathrm{max}}$, in the dictionary.
  • Figure 3: Examples of prediction for the intermediate layers. In each example, the left column shows the state variable after true time evolution, and the right column shows the prediction by the Koopman matrix. (a) and (b) corresponds to cases with successful prediction, and (c) and (d) are not-so-good ones.
  • Figure 4: Discrimination accuracy of the test data (top) and prediction error of the time evolution of the internal state variables (bottom) as the number of dictionary functions is increased.
  • Figure 5: Accuracy of the compressed model by the proposed method for various settings. In the blank region, the number of matrix elements is greater than that of the case without singular value decomposition.
  • ...and 9 more figures