Table of Contents
Fetching ...

From Path Signatures to Sequential Modeling: Incremental Signature Contributions for Offline RL

Ziyi Zhao, Qingchuan Li, Yuxuan Xu

TL;DR

The paper addresses the challenge of incorporating temporal structure into path-signature representations for offline reinforcement learning. It introduces Incremental Signature Contribution (ISC) to decompose truncated path signatures into a time-indexed sequence of incremental components and builds ISC-Transformer (ISCT) that integrates ISC into a standard Transformer for offline decision-making. The approach is theoretically grounded via universal nonlinearity and truncation theorems and empirically validated on locomotion and navigation benchmarks, showing competitive performance and robustness to delayed rewards and data downgrades. By exposing the progressive accumulation of signature information, ISC enables effective, temporally aware sequential modeling in offline RL with practical benefits for stability and sensitivity to dynamics.

Abstract

Path signatures embed trajectories into tensor algebra and constitute a universal, non-parametric representation of paths; however, in the standard form, they collapse temporal structure into a single global object, which limits their suitability for decision-making problems that require step-wise reactivity. We propose the Incremental Signature Contribution (ISC) method, which decomposes truncated path signatures into a temporally ordered sequence of elements in the tensor-algebra space, corresponding to incremental contributions induced by last path increments. This reconstruction preserves the algebraic structure and expressivity of signatures, while making their internal temporal evolution explicit, enabling processing signature-based representations via sequential modeling approaches. In contrast to full signatures, ISC is inherently sensitive to instantaneous trajectory updates, which is critical for sensitive and stability-requiring control dynamics. Building on this representation, we introduce ISC-Transformer (ISCT), an offline reinforcement learning model that integrates ISC into a standard Transformer architecture without further architectural modification. We evaluate ISCT on HalfCheetah, Walker2d, Hopper, and Maze2d, including settings with delayed rewards and downgraded datasets. The results demonstrate that ISC method provides a theoretically grounded and practically effective alternative to path processing for temporally sensitive control tasks.

From Path Signatures to Sequential Modeling: Incremental Signature Contributions for Offline RL

TL;DR

The paper addresses the challenge of incorporating temporal structure into path-signature representations for offline reinforcement learning. It introduces Incremental Signature Contribution (ISC) to decompose truncated path signatures into a time-indexed sequence of incremental components and builds ISC-Transformer (ISCT) that integrates ISC into a standard Transformer for offline decision-making. The approach is theoretically grounded via universal nonlinearity and truncation theorems and empirically validated on locomotion and navigation benchmarks, showing competitive performance and robustness to delayed rewards and data downgrades. By exposing the progressive accumulation of signature information, ISC enables effective, temporally aware sequential modeling in offline RL with practical benefits for stability and sensitivity to dynamics.

Abstract

Path signatures embed trajectories into tensor algebra and constitute a universal, non-parametric representation of paths; however, in the standard form, they collapse temporal structure into a single global object, which limits their suitability for decision-making problems that require step-wise reactivity. We propose the Incremental Signature Contribution (ISC) method, which decomposes truncated path signatures into a temporally ordered sequence of elements in the tensor-algebra space, corresponding to incremental contributions induced by last path increments. This reconstruction preserves the algebraic structure and expressivity of signatures, while making their internal temporal evolution explicit, enabling processing signature-based representations via sequential modeling approaches. In contrast to full signatures, ISC is inherently sensitive to instantaneous trajectory updates, which is critical for sensitive and stability-requiring control dynamics. Building on this representation, we introduce ISC-Transformer (ISCT), an offline reinforcement learning model that integrates ISC into a standard Transformer architecture without further architectural modification. We evaluate ISCT on HalfCheetah, Walker2d, Hopper, and Maze2d, including settings with delayed rewards and downgraded datasets. The results demonstrate that ISC method provides a theoretically grounded and practically effective alternative to path processing for temporally sensitive control tasks.
Paper Structure (19 sections, 2 theorems, 15 equations, 5 figures, 4 tables)

This paper contains 19 sections, 2 theorems, 15 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Let $V$ be a Banach space, $[a,b] \subset \mathbb{R}$ a compact interval, and $\gamma \in \mathcal{V}^1([a,b], V)$ a path of bounded variation. Let $S(\gamma) : \Delta_{[a,b]} \to T((V))$ denote the signature of $\gamma$. Then for every $n \in \mathbb{Z}_{\geq 1}$ and every $(s,t) \in \Delta_{[a,b]} where $\|\gamma\|_{1,[a,b]}$ denotes the total variation of the path $\gamma$ on the interval $[a,b

Figures (5)

  • Figure 1: Structure of ISCT.$G$ denotes goal token, $A_n$, $S_n$, $I_n$, $C_n$, $\hat{A}_n$ represents the action, state, INC, CROSS, and action prediction token on time step $t_n$, respectively. We use the convention that $t_0$ denotes the last step before the observed window.
  • Figure 2: Path length comparison on U-Maze tasks.
  • Figure 3: Path length comparison on Middle size Maze tasks.
  • Figure 4: Path length comparison on Large size Maze tasks.
  • Figure 5: Performance on Downgraded Training Set

Theorems & Definitions (2)

  • Theorem 1: Factorial Decay (Theorem 3.2 in lyons2025signaturemethodsmachinelearning)
  • Theorem 2: Universal Nonlinearity (Theorem 3.3 in lyons2025signaturemethodsmachinelearning)