Deep neural networks from the perspective of ergodic theory
Fan Zhang
TL;DR
This work proposes an ergodic-theory lens for understanding deep neural networks by treating layers as discrete time steps in a dynamical system. It argues that effective networks should operate on the edge of chaos, balancing ergodicity ($C_1$) with limited mixing ($C_2$) to enable robust interpolation and controlled extrapolation. By introducing network spectroscopy and finite-time Lyapunov exponents, the authors connect architectural choices—especially depth, width, and activation functions—to the spectral properties of the state evolution, offering heuristics for design and debugging. The proposed framework suggests practical metrics and guidelines to tune architectures for stable, expressive learning and hints at deeper connections to regularization and memory through high-dimensional path dependence.
Abstract
The design of deep neural networks remains somewhat of an art rather than precise science. By tentatively adopting ergodic theory considerations on top of viewing the network as the time evolution of a dynamical system, with each layer corresponding to a temporal instance, we show that some rules of thumb, which might otherwise appear mysterious, can be attributed heuristics.
