Exploiting Chaotic Dynamics as Deep Neural Networks

Shuhong Liu; Nozomi Akashi; Qingyao Huang; Yasuo Kuniyoshi; Kohei Nakajima

Exploiting Chaotic Dynamics as Deep Neural Networks

Shuhong Liu, Nozomi Akashi, Qingyao Huang, Yasuo Kuniyoshi, Kohei Nakajima

TL;DR

Chaos is pervasive and characterized by sensitive dependence on initial conditions. The paper investigates whether chaotic dynamics can be exploited for computation by identifying expansion properties in state-of-the-art DNNs and by proposing a framework that harnesses chaotic media with trainable input/output layers. The study analyzes the expansion property across DNNs via Finite-Time Maximum Lyapunov Exponents (FTMLE) and introduces a framework that uses chaotic dynamics as a computation medium—validated on FFESN, Lorenz 96, and coupled spin-torque oscillators. Across MNIST and Fashion-MNIST, chaotic implementations achieve competitive accuracy and faster convergence, highlighting a practical path toward energy-efficient neuromorphic and ML systems that integrate chaotic dynamics.

Abstract

Chaos presents complex dynamics arising from nonlinearity and a sensitivity to initial states. These characteristics suggest a depth of expressivity that underscores their potential for advanced computational applications. However, strategies to effectively exploit chaotic dynamics for information processing have largely remained elusive. In this study, we reveal that the essence of chaos can be found in various state-of-the-art deep neural networks. Drawing inspiration from this revelation, we propose a novel method that directly leverages chaotic dynamics for deep learning architectures. Our approach is systematically evaluated across distinct chaotic systems. In all instances, our framework presents superior results to conventional deep neural networks in terms of accuracy, convergence speed, and efficiency. Furthermore, we found an active role of transient chaos formation in our scheme. Collectively, this study offers a new path for the integration of chaos, which has long been overlooked in information processing, and provides insights into the prospective fusion of chaotic dynamics within the domains of machine learning and neuromorphic computation.

Exploiting Chaotic Dynamics as Deep Neural Networks

TL;DR

Abstract

Paper Structure (33 sections, 21 equations, 12 figures, 4 tables)

This paper contains 33 sections, 21 equations, 12 figures, 4 tables.

Introduction
Results
Expansion Property in Deep Neural Networks
Multilayer Perceptron
Convolutional Neural Network
Auto-encoder
Bidirectional Encoder Representations from Transformers (BERT)
Exploiting Chaotic Systems as Computation Medium
Feed-Forward Echo-State Network (FFESN)
Lorenz 96
Coupled Spin-Torque Oscillators
Discussion
Materials and Methods
Finite-Time Maximum Lyapunov Exponent (FTMLE)
Feed-Forward Echo-State Network (FFESN)
...and 18 more sections

Figures (12)

Figure 1: A. The illustration of the expansion behavior within the dynamical system. B. Conceptual design of the proposed framework that utilizes chaotic dynamics for the purpose of image classification. The terms $W_{\rm in}$ and $W_{\rm out}$ denote the linear transformation of the state space, while $T$ signifies the duration of self-evolution in the employed dynamical system.
Figure 2: FTMLE analyses for various deep neural networks. The FTMLE distribution corresponds with overall or layer-wise FTMLE values with respect to the input samples. Within the diagrams, blue areas denote contraction zones, while orange areas signify expansion zones. A. Five-layer MLP trained for a three-class classification task in a two-dimensional state space. The FTMLE analysis is conducted on the testing set uniformly sampled from the state space. The FTMLE map reflects the corresponding FTMLE values of the sampled test data. B. FTMLE analysis on a pre-trained ResNet50 model using a subset of the ImageNet validation set, where one image is sampled for each of the 1,000 classes. C. Pretrained ResNet-based auto-encoder evaluated on CIFAR10 testing set. D. Pretrained BERT-mini evaluated on IMDb testing set.
Figure 3: The architecture and analysis of the FFESN. The training and evaluation are conducted on the MNIST dataset. A. Traditional ESN framework. B. The FFESN model takes a linearly transformed input as the initial state. The forthcoming state solely depends on the reservoir's present state. C. FTMLE analysis of the FFESN, considering a spectral radius $\rho$ varying from 0.3 to 2.0. Iteration steps $T$ chosen as 5, 10, and 15, corresponding to the upper, middle, and lower graphs, respectively. Each pinnacle on the ridge signifies an FTMLE distribution for the reservoir at each $T$. $\rho$ under 0.3, exhibiting more negative trends, are excluded. D. Comparison of classification capability between FFESNs with $\rho$ of 0.6 and 1.4 at $T=15$. This includes initial and final reservoir states and neuron state dynamics over time, with neuron numbers on the vertical axis, timestamps on the horizontal, and neuron values color-coded. FTMLE distributions are shown for $T=15$, with expansion behavior indicated by orange. PCA on final state images highlights differences in specific categories. E. The heat map reflects testing accuracy across $\rho$ and $T$ values, using $\left| \log(\epsilon) \right|$ for prediction error shaded in colors, with higher values indicating better accuracy. The red line shows the Lyapunov time, denoting the system's predictability time-frame F. The heat map displaying the FFESN convergence rate, based on epochs needed for optimal accuracy within a 5e-4 error margin, averaged over five trials. The gray area indicates performance akin to random predictions, and the orange arrows point to overfitting areas.
Figure 4: Exploration of Lorenz 96 system performance on the MNIST task. A. Global bifurcation diagram depicting the system behavior with random input over an iteration time $T=20$. B. The MLE identifies global dynamics of fixed-point convergence, periodic orbits, and chaos. C--D. Illustration of the FTMLE spectra for the untrained Lorenz 96 models. E--F. Distribution of the FTMLE for the trained Lorenz 96 models, particularly highlighting the divergent transient dynamics observed at smaller $F$ values (e.g. less than 2) compared to its untrained counterpart. G. PCA projections of initial, final states with FTMLE indicators. 3D trajectory diagrams show state evolution in Lorenz 96 system's first three dimensions, spanning $T$ from 0 to 20. PCA results use $T=2.5$ particularly. The horizontal axis of internal state dynamics represents iteration time, and the vertical axis shows state values. The color gradient indicates neuron value. H--J. Heat maps illustrating classification accuracy, training epoch, and averaged FTMLE across varied $F$ and $T$. The accuracy map H has a color spectrum indicating $\left| \log(\epsilon) \right|$, where $\epsilon$ is the prediction error. The convergence metric I illustrates the number of training epochs necessary to achieve optimal accuracy, considering an error tolerance of $5e-5$. The FTMLE map J shows averaged FTMLE over the testing set.
Figure 5: Dynamical analyses for coupled STOs and solving MNIST task using coupled STOs. A--B. The schematic diagrams of STOs and coupled STOs. C. Typical long-term dynamics of periodic and chaotic coupled STOs with random initial states. D. MLEs of coupled STOs through coupling magnitude $A_{\rm cp}$. E. FTMLE distributions for coupled STOs for MNIST input data through coupling magnitude $A_{\rm cp}$. The left and right graphs show the FTMLE distributions of untrained and trained coupled STOs, respectively. F. The architecture using coupled STOs as an internal layer. G. The deep architecture using three coupled STOs as internal layers. H. The MNIST performance heat map in terms of iteration time and coupling magnitude. The color represents the $\left| \log(\epsilon) \right|$. I. The MNIST training speed heat map through the iteration time and coupling magnitude. The color represents the convergence epoch. J. The architecture using three coupled STOs with a convolutional input layer. The scatter plots illustrate MNIST test data in the first and second principal component space. The time series are MNIST data transformations in computational mediums.
...and 7 more figures

Exploiting Chaotic Dynamics as Deep Neural Networks

TL;DR

Abstract

Exploiting Chaotic Dynamics as Deep Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (12)