Neuro-symbolic Action Masking for Deep Reinforcement Learning

Shuai Han; Mehdi Dastani; Shihan Wang

Neuro-symbolic Action Masking for Deep Reinforcement Learning

Shuai Han, Mehdi Dastani, Shihan Wang

TL;DR

NSAM tackles DRL's tendency to explore infeasible actions by learning symbolic grounding that respects domain constraints through Probabilistic Sentential Decision Diagrams (PSDDs). A gating network maps high-dimensional states to PSDD parameters, enabling a state-conditioned distribution over symbolic models and producing action masks via MAP inference to guide a masked PPO policy. Across four constrained domains (including Visual Sudoku), NSAM delivers improved sample efficiency and substantially fewer constraint violations compared with strong baselines, demonstrating that symbolic structure can be leveraged to accelerate learning and improve safety. The work highlights the value of integrating logical knowledge with gradient-based RL and points to future directions in richer symbolic representations and unknown or erroneous constraints to broaden applicability.

Abstract

Deep reinforcement learning (DRL) may explore infeasible actions during training and execution. Existing approaches assume a symbol grounding function that maps high-dimensional states to consistent symbolic representations and a manually specified action masking techniques to constrain actions. In this paper, we propose Neuro-symbolic Action Masking (NSAM), a novel framework that automatically learn symbolic models, which are consistent with given domain constraints of high-dimensional states, in a minimally supervised manner during the DRL process. Based on the learned symbolic model of states, NSAM learns action masks that rules out infeasible actions. NSAM enables end-to-end integration of symbolic reasoning and deep policy optimization, where improvements in symbolic grounding and policy learning mutually reinforce each other. We evaluate NSAM on multiple domains with constraints, and experimental results demonstrate that NSAM significantly improves sample efficiency of DRL agent while substantially reducing constraint violations.

Neuro-symbolic Action Masking for Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 6 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Problem setting
Learning Symbolic Grounding
Compiling the Knowledge
Learning the parameters of PSDD in DRL
Combining symbolic reasoning with gradient-based DRL
End-to-end training framework
Related work
Experiment
Environments
Hyperparameters
Baselines
Learning efficiency and final performance
Less violation
Ablation study
...and 2 more sections

Figures (9)

Figure 1: Example states in the Visual Sudoku environment
Figure 2: (a) An example of joint distribution for three propositions $p_1, p_2$ and $p_3$ with the constraint $(p_1 \leftrightarrow p_2) \lor p_3$. (b) A SDD circuit with 'OR' and 'AND' logic gate to represent the constrain $(p_1 \leftrightarrow p_2) \lor p_3$. (c) The PSDD circuit to represent the distribution in Fig. \ref{['fig:psdd_dis']}. (d) The vtree used to group variables. (e) A general fragment to show the structure of SDD and PSDD.
Figure 3: The architecture design to calculate the probability of symbolic model $\bm{m}$ given DRL state $s$.
Figure 4: An illustration of the decision process of our agent, where the symbolic grounding module is as in Figure \ref{['fig:Framework']} and $\hat{\bm{m}}$ is calculated via the PSDD by Equation (\ref{['NSAM:equ_argmax']}).
Figure 5: Four tasks with logical constraints
...and 4 more figures

Neuro-symbolic Action Masking for Deep Reinforcement Learning

TL;DR

Abstract

Neuro-symbolic Action Masking for Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)