Table of Contents
Fetching ...

Stabilizing Backpropagation Through Time to Learn Complex Physics

Patrick Schnell, Nils Thuerey

TL;DR

The paper addresses the exploding/vanishing gradient problem in Backpropagation Through Time for differentiable physics simulators by introducing gradient stopping to create a balanced backward flow that propagates along the physics path while preserving minima. It further analyzes rotational vector-field effects arising from decoupled forward/backward passes and proposes a rotation-counteracting combined update that leverages both the stopped and original gradients. The authors provide a practical, scalable two-pass algorithm and validate it across guidance-by-repulsion, cart-pole, and quantum-control tasks, showing improved convergence especially on harder problems. This work advances stable learning in long-horizon physical simulations and offers a concrete direction for refining gradient flow in differentiable physics.

Abstract

Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.

Stabilizing Backpropagation Through Time to Learn Complex Physics

TL;DR

The paper addresses the exploding/vanishing gradient problem in Backpropagation Through Time for differentiable physics simulators by introducing gradient stopping to create a balanced backward flow that propagates along the physics path while preserving minima. It further analyzes rotational vector-field effects arising from decoupled forward/backward passes and proposes a rotation-counteracting combined update that leverages both the stopped and original gradients. The authors provide a practical, scalable two-pass algorithm and validate it across guidance-by-repulsion, cart-pole, and quantum-control tasks, showing improved convergence especially on harder problems. This work advances stable learning in long-horizon physical simulations and offers a concrete direction for refining gradient flow in differentiable physics.

Abstract

Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.
Paper Structure (33 sections, 16 equations, 17 figures, 4 tables)

This paper contains 33 sections, 16 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Toy example: a) loss landscape, b) regular gradient field (flow lines in red), c) modified vector field (yellow). For b) and c) the background color shows the L2 norm of the vectors. To improve optimization we trade an unbalanced gradient field for a balanced but rotating vector field.
  • Figure 2: Minimization of a loss function with Adam using a) the gradient field b) a rotating vector field c) our combined vector field constructed from the vector fields in a) and b). Background color indicates vector length. d) shows the loss curves. The rotational contribution in b) prevents Adam from converging while our combined vector field in c) allows Adam to approach the minimum.
  • Figure 3: Guidance-by-repulsion model: a) visualization, smaller circles indicate the configuration at earlier points in time, b) - e) learning curves for different regularization coefficients. (S) is not able to learn. With less regularization, (C) and (M) increasingly outperform (R). Same-color curves differ by clipping mode. Curves were smoothed over $20$ epochs for clarity.
  • Figure 4: Cart pole: a) visualization of the two-poles task, upper plots and more transparent style indicate earlier points in time, for easier visualization the two poles are plotted attached to two carts instead of one, b) - e) learning curves for $1$-$4$ poles. With more poles, (C) outperforms (R), (M) and (S). Same-color curves differ by clipping mode. Curves were smoothed over $20$ epochs for clarity.
  • Figure 5: Quantum control: a) visualization of a state transition over time $t$ and space $x$, background color indicates the probability density , b) - d) learning curves for different target states , e) update size. For higher target states, (C) outperforms (R). Same-color curves differ by clipping mode. Curves were smoothed over $20$ epochs for clarity.
  • ...and 12 more figures