FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Leo Ardon; Daniel Furelos-Blanco; Roko Parac; Alessandra Russo

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Leo Ardon, Daniel Furelos-Blanco, Roko Parac, Alessandra Russo

TL;DR

The paper tackles non-Markovian reward handling in reinforcement learning by introducing First-Order Reward Machines (FORMs), which label RM transitions with first-order logic to achieve more compact, transferable representations than traditional propositional RMs. It provides a learning approach for FORMs using inductive logic programming (ILASP) and formalizes a multi-agent RL framework where one agent manages each FORM state, enabling collaborative policy learning and better transfer across tasks. Empirical results show that FORMs improve scalability and learning speed compared to baselines, and their abstraction through first-order logic facilitates transfer to environments with more objects. The work advances both the expressivity of task specifications in RL and practical reuse of learned structures, with potential extensions to other temporally extended frameworks.

Abstract

Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning (RL) through finite-state machines. Traditional RMs, which label edges with propositional logic formulae, inherit the limited expressivity of propositional logic. This limitation hinders the learnability and transferability of RMs since complex tasks will require numerous states and edges. To overcome these challenges, we propose First-Order Reward Machines ($\texttt{FORM}$s), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for $\textbf{learning}$ $\texttt{FORM}$s and a multi-agent formulation for $\textbf{exploiting}$ them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared $\texttt{FORM}$. Our experimental results demonstrate the scalability of $\texttt{FORM}$s with respect to traditional RMs. Specifically, we show that $\texttt{FORM}$s can be effectively learnt for tasks where traditional RM learning approaches fail. We also show significant improvements in learning speed and task transferability thanks to the multi-agent learning framework and the abstraction provided by the first-order language.

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

TL;DR

Abstract

s), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for

s and a multi-agent formulation for

them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared

. Our experimental results demonstrate the scalability of

s with respect to traditional RMs. Specifically, we show that

s can be effectively learnt for tasks where traditional RM learning approaches fail. We also show significant improvements in learning speed and task transferability thanks to the multi-agent learning framework and the abstraction provided by the first-order language.

Paper Structure (21 sections, 8 equations, 15 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 8 equations, 15 figures, 3 tables, 1 algorithm.

Introduction
Background
Reinforcement Learning
Learning from Answer Sets
Reward Machines
Methodology
Language
First-Order Reward Machines (FORMs)
FORM Learning
Policy Learning
Experiments
FORM Learning
FORM Transfer
Related Work
Conclusion
...and 6 more sections

Figures (15)

Figure 1: An instance of the environment (a), and RMs for the task "visit all $\trimbox{1pt 0 1pt 0}{ }$ followed by any $\trimbox{1pt 0 1pt 0}{ }$ before reaching $\trimbox{1pt 0 1pt 0}{ }$" (b, c). See Example \ref{['ex:running_example']} for details.
Figure 2: Equivalence between Propositional RMs and FORMs.
Figure 3: ASP encoding for an existentially quantified sentence $\exists X. {\color{orange}\texttt{p}}(X)$
Figure 4: ASP encoding for a universally quantified sentence $\forall X. {\color{orange}\texttt{p}}(X)$
Figure 5: Average undiscounted return for the three tasks.
...and 10 more figures

Theorems & Definitions (11)

Definition 2.1: Reward Machine
Example 2.1
Example 2.2
Example 3.1
Definition 3.1
Example 3.2
Definition 3.2: First-Order RM
Definition 3.3
Example 3.3
Example 3.4
...and 1 more

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

TL;DR

Abstract

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (11)