Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following

Motoya Ohnishi; Iretiayo Akinola; Jie Xu; Ajay Mandlekar; Fabio Ramos

Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following

Motoya Ohnishi, Iretiayo Akinola, Jie Xu, Ajay Mandlekar, Fabio Ramos

TL;DR

This work introduces signature control, a framework that generalizes dynamic programming from states to entire trajectories by leveraging path signatures. By reformulating DP with the path-to-go $S$-function and using Chen’s identity, the authors show how a trajectory-centered backup subsumes and extends classical Bellman updates, enabling time-step adaptivity and robustness to model misspecification. The framework is instantiated as Signature MPC, which optimizes over the signature of the full path using a receding-horizon surrogate cost and a terminal $S$-function, with empirical validation on simple and robotic tasks demonstrating improved tracking accuracy and disturbance robustness. The approach offers a principled way to encode rich geometric information of trajectories, potentially improving data efficiency and resilience in control and RL settings, while highlighting avenues for theory and real-time deployment enhancements.

Abstract

Path signatures have been proposed as a powerful representation of paths that efficiently captures the path's analytic and geometric characteristics, having useful algebraic properties including fast concatenation of paths through tensor products. Signatures have recently been widely adopted in machine learning problems for time series analysis. In this work we establish connections between value functions typically used in optimal control and intriguing properties of path signatures. These connections motivate our novel control framework with signature transforms that efficiently generalizes the Bellman equation to the space of trajectories. We analyze the properties and advantages of the framework, termed signature control. In particular, we demonstrate that (i) it can naturally deal with varying/adaptive time steps; (ii) it propagates higher-level information more efficiently than value function updates; (iii) it is robust to dynamical system misspecification over long rollouts. As a specific case of our framework, we devise a model predictive control method for path tracking. This method generalizes integral control, being suitable for problems with unknown disturbances. The proposed algorithms are tested in simulation, with differentiable physics models including typical control and robotics tasks such as point-mass, curve following for an ant model, and a robotic manipulator.

Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following

TL;DR

This work introduces signature control, a framework that generalizes dynamic programming from states to entire trajectories by leveraging path signatures. By reformulating DP with the path-to-go

-function and using Chen’s identity, the authors show how a trajectory-centered backup subsumes and extends classical Bellman updates, enabling time-step adaptivity and robustness to model misspecification. The framework is instantiated as Signature MPC, which optimizes over the signature of the full path using a receding-horizon surrogate cost and a terminal

-function, with empirical validation on simple and robotic tasks demonstrating improved tracking accuracy and disturbance robustness. The approach offers a principled way to encode rich geometric information of trajectories, potentially improving data efficiency and resilience in control and RL settings, while highlighting avenues for theory and real-time deployment enhancements.

Abstract

Paper Structure (68 sections, 1 theorem, 86 equations, 21 figures, 15 tables, 2 algorithms)

This paper contains 68 sections, 1 theorem, 86 equations, 21 figures, 15 tables, 2 algorithms.

Introduction
Notation:
Related work
Path signature:
Value-based control and RL:
Path tracking:
Preliminaries
Path signature
Properties of the path signatures:
Dynamical systems and path tracking
Signature control
Problem formulation
Dynamic programming over signatures
Path-to-go:
Truncated signature formulation:
...and 53 more sections

Key Result

theorem 2

Let the function $\mathcal{S}$ be defined by eq:sfunc. Under the Markov assumption, it follows that where the expected $S$-function$\mathcal{ES}^{\pi}$ is defined by taking expectation over actions as below:

Figures (21)

Figure 1: Left: simple tracking example. The black and blue circles represent an obstacle and the goal. Given a path (black line), a point-mass (red) follows this reference via minimization of deviation of signatures in an online fashion with optimized action repetitions. Right: illustration of path-to-go formulation as an analogy to value-to-go in the classical settings.
Figure 2: Left: how a cumulative (discounted) reward is represented by our path formulation by an interpolation for representing the value as surface and by a transportation for discounting and concatenations of paths. Right: an error of approximated one-step dynamics propagates through time steps; while an error on signature has less effect over horizon.
Figure 3: Top (two-mass spring, damper system): the plots show the evolutions of positions of two masses for signature MPC with/without second depth signature terms, showing how signature MPC reduces to integral control. Down: (Left two; Ant) tracking behaviors of signature control (left) and baseline MPC (right) for the same reaching time, where green lines are the executed trajectories. (Right two; Robotic arm): tracking behaviors of signature control (left) and baseline MPC (right) under disturbance $-30$.
Figure 4: Illustrations of properties of path signatures. Paths in the space $\mathcal{X}\subset\mathbb{R}^d$ are uniquely transformed into signatures upto tree-like equivalence. One can construct an RKHS where the kernel represents the inner product between two signatures. This kernel is a universal kernel.
Figure 5: Random dynamical system consists of a model of the noise and the physical phase space. For each realization $\omega$, and initial state $x$, the RDS is the flow over sample space and phase space. The illustration is inspired by arnold1995randomghil2008climate.
...and 16 more figures

Theorems & Definitions (13)

definition 1: Path signatures lyons2007differential
definition 2: Path tracking
theorem 2: Signature Dynamic Programming for Decision Making
definition 3: Tensor algebra
definition 4: Signature kernel salvi2021signature
definition 5: Random dynamical systems arnold1995random
remark 1
proof : Proof of Theorem \ref{['thm:main']}
Claim 3
proof
...and 3 more

Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following

TL;DR

Abstract

Signatures Meet Dynamic Programming: Generalizing Bellman Equations for Trajectory Following

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (13)