Deep neural networks from the perspective of ergodic theory

Fan Zhang

Deep neural networks from the perspective of ergodic theory

Fan Zhang

TL;DR

This work proposes an ergodic-theory lens for understanding deep neural networks by treating layers as discrete time steps in a dynamical system. It argues that effective networks should operate on the edge of chaos, balancing ergodicity ($C_1$) with limited mixing ($C_2$) to enable robust interpolation and controlled extrapolation. By introducing network spectroscopy and finite-time Lyapunov exponents, the authors connect architectural choices—especially depth, width, and activation functions—to the spectral properties of the state evolution, offering heuristics for design and debugging. The proposed framework suggests practical metrics and guidelines to tune architectures for stable, expressive learning and hints at deeper connections to regularization and memory through high-dimensional path dependence.

Abstract

The design of deep neural networks remains somewhat of an art rather than precise science. By tentatively adopting ergodic theory considerations on top of viewing the network as the time evolution of a dynamical system, with each layer corresponding to a temporal instance, we show that some rules of thumb, which might otherwise appear mysterious, can be attributed heuristics.

Deep neural networks from the perspective of ergodic theory

TL;DR

) with limited mixing (

) to enable robust interpolation and controlled extrapolation. By introducing network spectroscopy and finite-time Lyapunov exponents, the authors connect architectural choices—especially depth, width, and activation functions—to the spectral properties of the state evolution, offering heuristics for design and debugging. The proposed framework suggests practical metrics and guidelines to tune architectures for stable, expressive learning and hints at deeper connections to regularization and memory through high-dimensional path dependence.

Abstract

Paper Structure (8 sections, 7 equations)

This paper contains 8 sections, 7 equations.

Introduction and motivation
As fitting functions to training data
As dynamical systems
Network spectroscopy
Effect of network architectural traits
Depth of network vs. activation function
Width of layers vs. connectivity
Conclusion

Deep neural networks from the perspective of ergodic theory

TL;DR

Abstract

Deep neural networks from the perspective of ergodic theory

Authors

TL;DR

Abstract

Table of Contents