Predictability Enables Parallelization of Nonlinear State Space Models
Xavier Gonzalez, Leo Kozachkov, David M. Zoltowski, Kenneth L. Clarkson, Scott W. Linderman
TL;DR
The paper shows that evaluating nonlinear state-space models in parallel hinges on the conditioning of a residual-based merit function, which is governed by the system's predictability. By connecting the Polyak–Łojasiewicz constant of the merit function to the largest Lyapunov exponent, it proves that predictable (negative $\lambda$) dynamics yield well-conditioned landscapes and enable global linear convergence of DEER, with a sublinear $O((\log T)^2)$ time in long sequences. It also characterizes the basin of quadratic convergence and demonstrates a sharp threshold near $\lambda=0$ in experiments across RNNs, Langevin dynamics, and chaotic observers. The results provide a design principle: ensuring predictability makes merit-function based parallelization practical, and guide when to use parallel evaluation versus sequential rollout. Overall, the work offers a theoretical framework and practical guidance for leveraging parallelism in nonlinear state-space modeling by tying dynamical stability to optimization geometry and convergence.
Abstract
The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) or DeepPCR (arXiv:2309.16318) have shown that evaluating a state space model can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in evaluation time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. In this work, we establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in $O((\log T)^2)$ time, where $T$ is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis demonstrates that for predictable systems, the optimization problem is always well-conditioned, whereas for unpredictable systems, the conditioning degrades exponentially as a function of the sequence length. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.
