Table of Contents
Fetching ...

Stratifying Reinforcement Learning with Signal Temporal Logic

Justin Curry, Alberto Speranzon

Abstract

In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be viewed as inducing a stratification of space-time. The significance of this interpretation is twofold. First, it offers a fresh theoretical framework for analyzing the structure of the embedding space generated by deep reinforcement learning (DRL) and relates it to the geometry of the ambient decision space. Second, it provides a principled framework that both enables the reuse of existing high-dimensional analysis tools and motivates the creation of novel computational techniques. To ground the theory, we (1) illustrate the role of stratification theory in Minigrid games and (2) apply numerical techniques to the latent embeddings of a DRL agent playing such a game where the robustness of STL formulas is used as the reward. In the process, we propose computationally efficient signatures that, based on preliminary evidence, appear promising for uncovering the stratification structure of such embedding spaces.

Stratifying Reinforcement Learning with Signal Temporal Logic

Abstract

In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be viewed as inducing a stratification of space-time. The significance of this interpretation is twofold. First, it offers a fresh theoretical framework for analyzing the structure of the embedding space generated by deep reinforcement learning (DRL) and relates it to the geometry of the ambient decision space. Second, it provides a principled framework that both enables the reuse of existing high-dimensional analysis tools and motivates the creation of novel computational techniques. To ground the theory, we (1) illustrate the role of stratification theory in Minigrid games and (2) apply numerical techniques to the latent embeddings of a DRL agent playing such a game where the robustness of STL formulas is used as the reward. In the process, we propose computationally efficient signatures that, based on preliminary evidence, appear promising for uncovering the stratification structure of such embedding spaces.

Paper Structure

This paper contains 17 sections, 4 theorems, 5 equations, 13 figures.

Key Result

Lemma B.5

For any minigrid game where rotations are ignored, as in Example ex:no-rotation-game, and with a single goal state $g$, any policy that generates successful trajectories passing through arbitrary start states $s$, can be viewed as defining a poset-stratified space with indexing poset a tree. $\black

Figures (13)

  • Figure B1: A disk with a line attached can be stratified via the face-relation poset, which refines the dimension poset $0 < 1 < 2$.
  • Figure B2: Empty (left) and Door-Key (right) Minigrid Environments minigrid2023
  • Figure B3: An empty $3\times 3$ Minigrid Environment is first translated into a cell complex with 49 cells (9 two-dimensional cells, labeled a through i, 24 edges, and 16 vertices), which can be viewed as a finite "digitized" topological space with 49 points. Such a topologized minigrid environment has a continuous map to the tree poset displayed right.
  • Figure B4: Stratifying a "Coin Collection" Game inspired by baryshnikov2023linear.
  • Figure B5: Example of an environment where an agent can move within a room or traverse a winding corridor. Dots are agent's position samples.
  • ...and 8 more figures

Theorems & Definitions (15)

  • Definition B.1
  • Definition B.2
  • Example B.3
  • Example B.4
  • Lemma B.5
  • proof
  • Remark B.6
  • Theorem B.7
  • proof
  • Definition B.8
  • ...and 5 more