Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

Siqi Wang; Shaoyuan Li; Li Yin; Xiang Yin

Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

Siqi Wang, Shaoyuan Li, Li Yin, Xiang Yin

TL;DR

This work addresses policy synthesis for Signal Temporal Logic tasks in unknown stochastic environments, focusing on temporal robustness to time uncertainty. It introduces two RL objectives: maximize the probability that the temporal robustness $\theta(\Phi,\mathbf{s}_{0:T})$ meets a threshold $\delta$, and maximize the expected worst-case spatial robustness $\rho_{\delta}(\Phi,\mathbf{s}_{0:T})$ under time shifts bounded by $\delta$, solved via Q-learning on augmented $\tau$-MDPs. A $\tau$-MDP construction converts non-Markovian robustness signals into Markovian rewards, with $\tau$ chosen as $hrz(\phi) + \delta$ (or suitably for other cases), and an approximation based on delayed signals provides tractable learning with theoretical guarantees. Case studies demonstrate feasibility in yielding temporally robust STL satisfaction with unknown dynamics.

Abstract

This paper investigates the problem of designing control policies that satisfy high-level specifications described by signal temporal logic (STL) in unknown, stochastic environments. While many existing works concentrate on optimizing the spatial robustness of a system, our work takes a step further by also considering temporal robustness as a critical metric to quantify the tolerance of time uncertainty in STL. To this end, we formulate two relevant control objectives to enhance the temporal robustness of the synthesized policies. The first objective is to maximize the probability of being temporally robust for a given threshold. The second objective is to maximize the worst-case spatial robustness value within a bounded time shift. We use reinforcement learning to solve both control synthesis problems for unknown systems. Specifically, we approximate both control objectives in a way that enables us to apply the standard Q-learning algorithm. Theoretical bounds in terms of the approximations are also derived. We present case studies to demonstrate the feasibility of our approach.

Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

TL;DR

meets a threshold

, and maximize the expected worst-case spatial robustness

under time shifts bounded by

, solved via Q-learning on augmented

-MDPs. A

-MDP construction converts non-Markovian robustness signals into Markovian rewards, with

chosen as

(or suitably for other cases), and an approximation based on delayed signals provides tractable learning with theoretical guarantees. Case studies demonstrate feasibility in yielding temporally robust STL satisfaction with unknown dynamics.

Abstract

Paper Structure (11 sections, 12 equations)

This paper contains 11 sections, 12 equations.

Introduction
Preliminaries
Signal Temporal Logic Basics
Temporal Robustness of STL
Reinforcement Learning for MDPs
Problem Formulation
Case of Guaranteed Temporal Robustness
Case of Spatial-Temporal Robustness
Reinforcement Learning for Temporal Robustness
Construction of $\tau$-MDPs
Approximation of Robust Probability

Theorems & Definitions (2)

Remark 1
Remark 2

Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

TL;DR

Abstract

Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Theorems & Definitions (2)