Table of Contents
Fetching ...

Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

Siqi Wang, Shaoyuan Li, Li Yin, Xiang Yin

TL;DR

This work addresses policy synthesis for Signal Temporal Logic tasks in unknown stochastic environments, focusing on temporal robustness to time uncertainty. It introduces two RL objectives: maximize the probability that the temporal robustness $\theta(\Phi,\mathbf{s}_{0:T})$ meets a threshold $\delta$, and maximize the expected worst-case spatial robustness $\rho_{\delta}(\Phi,\mathbf{s}_{0:T})$ under time shifts bounded by $\delta$, solved via Q-learning on augmented $\tau$-MDPs. A $\tau$-MDP construction converts non-Markovian robustness signals into Markovian rewards, with $\tau$ chosen as $hrz(\phi) + \delta$ (or suitably for other cases), and an approximation based on delayed signals provides tractable learning with theoretical guarantees. Case studies demonstrate feasibility in yielding temporally robust STL satisfaction with unknown dynamics.

Abstract

This paper investigates the problem of designing control policies that satisfy high-level specifications described by signal temporal logic (STL) in unknown, stochastic environments. While many existing works concentrate on optimizing the spatial robustness of a system, our work takes a step further by also considering temporal robustness as a critical metric to quantify the tolerance of time uncertainty in STL. To this end, we formulate two relevant control objectives to enhance the temporal robustness of the synthesized policies. The first objective is to maximize the probability of being temporally robust for a given threshold. The second objective is to maximize the worst-case spatial robustness value within a bounded time shift. We use reinforcement learning to solve both control synthesis problems for unknown systems. Specifically, we approximate both control objectives in a way that enables us to apply the standard Q-learning algorithm. Theoretical bounds in terms of the approximations are also derived. We present case studies to demonstrate the feasibility of our approach.

Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning

TL;DR

This work addresses policy synthesis for Signal Temporal Logic tasks in unknown stochastic environments, focusing on temporal robustness to time uncertainty. It introduces two RL objectives: maximize the probability that the temporal robustness meets a threshold , and maximize the expected worst-case spatial robustness under time shifts bounded by , solved via Q-learning on augmented -MDPs. A -MDP construction converts non-Markovian robustness signals into Markovian rewards, with chosen as (or suitably for other cases), and an approximation based on delayed signals provides tractable learning with theoretical guarantees. Case studies demonstrate feasibility in yielding temporally robust STL satisfaction with unknown dynamics.

Abstract

This paper investigates the problem of designing control policies that satisfy high-level specifications described by signal temporal logic (STL) in unknown, stochastic environments. While many existing works concentrate on optimizing the spatial robustness of a system, our work takes a step further by also considering temporal robustness as a critical metric to quantify the tolerance of time uncertainty in STL. To this end, we formulate two relevant control objectives to enhance the temporal robustness of the synthesized policies. The first objective is to maximize the probability of being temporally robust for a given threshold. The second objective is to maximize the worst-case spatial robustness value within a bounded time shift. We use reinforcement learning to solve both control synthesis problems for unknown systems. Specifically, we approximate both control objectives in a way that enables us to apply the standard Q-learning algorithm. Theoretical bounds in terms of the approximations are also derived. We present case studies to demonstrate the feasibility of our approach.
Paper Structure (11 sections, 12 equations)