Table of Contents
Fetching ...

Utilizing Lyapunov Exponents in designing deep neural networks

Tirthankar Mittra

TL;DR

This work addresses the efficiency of hyperparameter tuning for large deep neural networks by applying Lyapunov exponents as a physics-inspired guide. By modeling SGD dynamics as a nonlinear system and computing the largest Lyapunov exponent $\lambda_1$ via a Kantz–Wolf based approach, it links hyperparameters such as learning rate $\alpha$ and activation functions to training stability and convergence. Key findings show that $\alpha$ can induce chaotic weight updates, and that activations with more negative exponents (notably ReLU in the study) tend to yield faster convergence and lower final loss, while more negative local exponents from different initial weights correlate with better outcomes within bounded initial conditions. Overall, Lyapunov exponents offer a systematic, data-efficient metric to guide hyperparameter selection and initialization in DNNs, with public code enabling practical adoption.

Abstract

Training large deep neural networks is resource intensive. This study investigates whether Lyapunov exponents can accelerate this process by aiding in the selection of hyperparameters. To study this I formulate an optimization problem using neural networks with different activation functions in the hidden layers. By initializing model weights with different random seeds, I calculate the Lyapunov exponent while performing traditional gradient descent on these model weights. The findings demonstrate that variations in the learning rate can induce chaotic changes in model weights. I also show that activation functions with more negative Lyapunov exponents exhibit better convergence properties. Additionally, the study also demonstrates that Lyapunov exponents can be utilized to select effective initial model weights for deep neural networks, potentially enhancing the optimization process.

Utilizing Lyapunov Exponents in designing deep neural networks

TL;DR

This work addresses the efficiency of hyperparameter tuning for large deep neural networks by applying Lyapunov exponents as a physics-inspired guide. By modeling SGD dynamics as a nonlinear system and computing the largest Lyapunov exponent via a Kantz–Wolf based approach, it links hyperparameters such as learning rate and activation functions to training stability and convergence. Key findings show that can induce chaotic weight updates, and that activations with more negative exponents (notably ReLU in the study) tend to yield faster convergence and lower final loss, while more negative local exponents from different initial weights correlate with better outcomes within bounded initial conditions. Overall, Lyapunov exponents offer a systematic, data-efficient metric to guide hyperparameter selection and initialization in DNNs, with public code enabling practical adoption.

Abstract

Training large deep neural networks is resource intensive. This study investigates whether Lyapunov exponents can accelerate this process by aiding in the selection of hyperparameters. To study this I formulate an optimization problem using neural networks with different activation functions in the hidden layers. By initializing model weights with different random seeds, I calculate the Lyapunov exponent while performing traditional gradient descent on these model weights. The findings demonstrate that variations in the learning rate can induce chaotic changes in model weights. I also show that activation functions with more negative Lyapunov exponents exhibit better convergence properties. Additionally, the study also demonstrates that Lyapunov exponents can be utilized to select effective initial model weights for deep neural networks, potentially enhancing the optimization process.
Paper Structure (5 sections, 7 equations, 5 figures, 1 table)

This paper contains 5 sections, 7 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The concept of Lyapunov Exponent.
  • Figure 2: Optimization landscape of a neural network's loss function
  • Figure 3: The concept of Lyapunov Exponent.
  • Figure 4: Lyapunov Exponent as a function of the learning rate for different activation functions i.e. Linear, Sigmoid, ReLU (from left to right).
  • Figure 5: Lyapunov Exponent as a function of final loss achieved for different starting points and for different activation functions i.e. Linear, Sigmoid, ReLU (from left to right).