Gradient Descent Learns Linear Dynamical Systems
Moritz Hardt, Tengyu Ma, Benjamin Recht
TL;DR
The paper shows that stochastic gradient descent, when combined with a projection onto a convex acquiescence region, efficiently learns unknown linear dynamical systems from noisy sequence data despite non-convexity. By translating the problem into a frequency-domain idealized risk that is weakly quasi-convex, the authors obtain polynomial-time convergence and sample complexity, and they extend the framework to improper learning and MIMO settings. A key idea is the acquiescence condition on the denominator of the transfer function, which controls the optimization landscape; over-parameterization further broadens applicability. The work connects system identification, passive systems, and modern optimization to yield practical, theoretically grounded guarantees, supported by simulations. The results offer a principled path toward tractable learning for a broad class of linear and MIMO dynamical systems in noisy environments.
Abstract
We prove that stochastic gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system. Even though the objective function is non-convex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.
