Utilizing Lyapunov Exponents in designing deep neural networks
Tirthankar Mittra
TL;DR
This work addresses the efficiency of hyperparameter tuning for large deep neural networks by applying Lyapunov exponents as a physics-inspired guide. By modeling SGD dynamics as a nonlinear system and computing the largest Lyapunov exponent $\lambda_1$ via a Kantz–Wolf based approach, it links hyperparameters such as learning rate $\alpha$ and activation functions to training stability and convergence. Key findings show that $\alpha$ can induce chaotic weight updates, and that activations with more negative exponents (notably ReLU in the study) tend to yield faster convergence and lower final loss, while more negative local exponents from different initial weights correlate with better outcomes within bounded initial conditions. Overall, Lyapunov exponents offer a systematic, data-efficient metric to guide hyperparameter selection and initialization in DNNs, with public code enabling practical adoption.
Abstract
Training large deep neural networks is resource intensive. This study investigates whether Lyapunov exponents can accelerate this process by aiding in the selection of hyperparameters. To study this I formulate an optimization problem using neural networks with different activation functions in the hidden layers. By initializing model weights with different random seeds, I calculate the Lyapunov exponent while performing traditional gradient descent on these model weights. The findings demonstrate that variations in the learning rate can induce chaotic changes in model weights. I also show that activation functions with more negative Lyapunov exponents exhibit better convergence properties. Additionally, the study also demonstrates that Lyapunov exponents can be utilized to select effective initial model weights for deep neural networks, potentially enhancing the optimization process.
