Understanding the role of autoencoders for stiff dynamical systems using information theory
Vijayamanikandan Vijayarangan, Harshavardhana A. Uranakara, Francisco E. Hernández-Pérez, Hong G. Im
TL;DR
This paper investigates how autoencoders (AEs) paired with neural ODEs (NODEs) can alleviate stiffness in stiff dynamical systems by forming a smooth, low-dimensional latent manifold. Using an information-theoretic framework based on mutual information and Renyi entropy, it analyzes encoder/decoder information flow and identifies two training phases (fitting and compression) that drive the latent representation toward minimal sufficient statistics. The results with a H$_2$-air batch reactor show that deeper latent spaces (e.g., $N_L>4$) satisfy data-processing inequalities and yield smoother latent dynamics, attributed to disentanglement and better mode mixing, which transforms rare events in physical space into more probable latent events. The findings provide a principled basis for selecting latent dimensions and explain why dynamics-informed training enhances ROM fidelity and computational efficiency for stiff chemical kinetics.
Abstract
Using the information theory, this study provides insights into how the construction of latent space of autoencoder (AE) using deep neural network (DNN) training finds a smooth low-dimensional manifold in the stiff dynamical system. Our recent study [1] reported that an autoencoder (AE) combined with neural ODE (NODE) as a surrogate reduced order model (ROM) for the integration of stiff chemically reacting systems led to a significant reduction in the temporal stiffness, and the behavior was attributed to the identification of a slow invariant manifold by the nonlinear projection of the AE. The present work offers fundamental understanding of the mechanism by employing concepts from information theory and better mixing. The learning mechanism of both the encoder and decoder are explained by plotting the evolution of mutual information and identifying two different phases. Subsequently, the density distribution is plotted for the physical and latent variables, which shows the transformation of the \emph{rare event} in the physical space to a \emph{highly likely} (more probable) event in the latent space provided by the nonlinear autoencoder. Finally, the nonlinear transformation leading to density redistribution is explained using concepts from information theory and probability.
