Unlocked Backpropagation using Wave Scattering
Christian Pehle, Jean-Jacques Slotine
TL;DR
This work reframes backpropagation as a hyperbolic relaxation of the Pontryagin Maximum Principle by introducing an optimization-time dimension $\tau$, turning the training problem into a worldsheet with finite propagation speed $c$ and counter-propagating waves.The authors develop a discretized worldsheet algorithm with impedance-matched layer junctions and wave-residual sources, enabling fully unlocked, local updates that require only nearest-neighbor communication.A Minimal Reflection principle shows parameter updates arise from boundary-condition matching, identifying gradient descent, Newton, and momentum as special cases of impedance matching and energy dissipation, and offering a physical intuition for optimizer behavior.The framework connects to a broad spectrum of related ideas—from parallel-in-time methods and brain-inspired learning to passivity and analog optimization—while pointing to practical implications for parallel hardware and potential neuromorphic/substrate implementations.Overall, the approach provides a principled route to asynchronous, locally computable optimization grounded in wave dynamics, with clear theoretical interpretations and guidance for future numerical and hardware explorations.
Abstract
Both the backpropagation algorithm in machine learning and the maximum principle in optimal control theory are posed as a two-point boundary problem, resulting in a "forward-backward" lock. We derive a reformulation of the maximum principle in optimal control theory as a hyperbolic initial value problem by introducing an additional "optimization time" dimension. We introduce counter-propagating wave variables with finite propagation speed and recast the optimization problem in terms of scattering relationships between them. This relaxation of the original problem can be interpreted as a physical system that equilibrates and changes its physical properties in order to minimize reflections. We discretize this continuum theory to derive a family of fully unlocked algorithms suitable for training neural networks. Different parameter dynamics, including gradient descent, can be derived by demanding dissipation and minimization of reflections at parameter ports. These results also imply that any physical substrate that supports the scattering and dissipation of waves can be interpreted as solving an optimization problem.
