Table of Contents
Fetching ...

Neural Reward Machines

Elena Umili, Francesco Argenziano, Roberto Capobianco

TL;DR

This work introduces Neural Reward Machines (NRMs), a neurosymbolic framework that enables reasoning and learning in non-Markovian RL environments without requiring a known symbol grounding function. NRMs fuse probabilistic Moore Machines with neural perception, allowing groundings and automaton transitions to be learned from data while leveraging prior temporal knowledge. A key contribution is a groundability analysis and an efficient algorithm to identify Unremovable Reasoning Shortcuts (URS), which assesses how symbolic knowledge may influence grounding under complete observations. Experiments in Minecraft-like settings demonstrate that NRMs achieve performance between RNN-based and RM-based baselines, effectively exploiting partial symbolic knowledge and outperforming pure deep RL methods. The framework thus offers a practical path to integrate high-level temporal knowledge with end-to-end learning, enabling scalable non-Markovian RL in complex, non-symbolic environments.

Abstract

Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use symbolic formalisms (as Linear Temporal Logic or automata) to specify the temporally-extended task. These approaches only work in finite and discrete state environments or continuous problems for which a mapping between the raw state and a symbolic interpretation is known as a symbol grounding (SG) function. Here, we define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic non-markovian RL domains, which is based on the probabilistic relaxation of Moore Machines. We combine RL with semisupervised symbol grounding (SSSG) and we show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge. Moreover, we advance the research in SSSG, proposing an algorithm for analysing the groundability of temporal specifications, which is more efficient than baseline techniques of a factor $10^3$.

Neural Reward Machines

TL;DR

This work introduces Neural Reward Machines (NRMs), a neurosymbolic framework that enables reasoning and learning in non-Markovian RL environments without requiring a known symbol grounding function. NRMs fuse probabilistic Moore Machines with neural perception, allowing groundings and automaton transitions to be learned from data while leveraging prior temporal knowledge. A key contribution is a groundability analysis and an efficient algorithm to identify Unremovable Reasoning Shortcuts (URS), which assesses how symbolic knowledge may influence grounding under complete observations. Experiments in Minecraft-like settings demonstrate that NRMs achieve performance between RNN-based and RM-based baselines, effectively exploiting partial symbolic knowledge and outperforming pure deep RL methods. The framework thus offers a practical path to integrate high-level temporal knowledge with end-to-end learning, enabling scalable non-Markovian RL in complex, non-symbolic environments.

Abstract

Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use symbolic formalisms (as Linear Temporal Logic or automata) to specify the temporally-extended task. These approaches only work in finite and discrete state environments or continuous problems for which a mapping between the raw state and a symbolic interpretation is known as a symbol grounding (SG) function. Here, we define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic non-markovian RL domains, which is based on the probabilistic relaxation of Moore Machines. We combine RL with semisupervised symbol grounding (SSSG) and we show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge. Moreover, we advance the research in SSSG, proposing an algorithm for analysing the groundability of temporal specifications, which is more efficient than baseline techniques of a factor .
Paper Structure (41 sections, 4 theorems, 27 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 41 sections, 4 theorems, 27 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $D^*_{sym}(L)$ denote the set of all strings over $P$ with maximum length equal to $L$. If $L_1 \leq L_2$, then $\#RS^*(\phi, D^*_{sym}(L_2)) \leq \#RS^*(\phi, D^*_{sym}(L_1))$.

Figures (4)

  • Figure 1: a) An example of non-Markovian navigation environment inspired by the Minecraft videogame. b) Moore Machine for the task: the agent has to visit the pickaxe (P), the lava (L) and the door (D) cells in any order. c) Implementation of NRM with neural networks.
  • Figure 2: Results in the map environment when using a) tasks of the first class, and b) in the second class. Results in the image environment when training on c) the first task class d) the second task class.
  • Figure 3: Training rewards for tasks of Class 1 and 2 on map environment.
  • Figure 4: Training rewards for tasks of Class 1 and 2 on image environment.

Theorems & Definitions (8)

  • Theorem 1
  • Corollary 2
  • Theorem 3
  • Theorem 4
  • proof
  • proof
  • proof
  • proof