Table of Contents
Fetching ...

Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards

Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, Luke Ong

TL;DR

The main result is that each RL problem for $\omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines.

Abstract

Linear temporal logic (LTL) and, more generally, $ω$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for $ω$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and $ω$-regular objectives can be learned asymptotically.

Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards

TL;DR

The main result is that each RL problem for -regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines.

Abstract

Linear temporal logic (LTL) and, more generally, -regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for -regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and -regular objectives can be learned asymptotically.

Paper Structure

This paper contains 22 sections, 17 theorems, 19 equations, 4 figures, 1 algorithm.

Key Result

Proposition 4

There is an MDP $\mathcal{M}$ and an $\omega$-regular language $L$ for which it is impossible to find a reward function $\mathcal{R}:S \times A \times S \rightarrow\mathbb{R}$ such that every $\mathcal{J}^{\mathcal{M}}_{\mathcal{R}^{\textrm{avg}}}$-optimal policy of $\mathcal{M}$ also maximises the

Figures (4)

  • Figure 1: Examples of an MDP and DRA.
  • Figure 2: A reward machine and the product MDP for the running \ref{['ex:running']}.
  • Figure 3: Counter-example for prefix-independent objectives.
  • Figure 4: Reward machine yielded by our construction in \ref{['sec:grey']} for the running example.

Theorems & Definitions (35)

  • Example 1
  • Definition 2
  • Definition 3: Alur:2022
  • Proposition 4
  • proof
  • Definition 5
  • Definition 6
  • Definition 7
  • Lemma 7
  • Theorem 8
  • ...and 25 more