Table of Contents
Fetching ...

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares

TL;DR

SDM spans AP and RL, each with strengths in planning and learning but limitations in data efficiency, generalization, and interpretability. The paper surveys symbolic, subsymbolic, and hybrid SDM methods, and introduces Learn-to-Plan as a bridge between AP and RL, while also detailing approaches to learn SDP structure (action models and domain knowledge). It offers a two-dimensional taxonomy—solution method (AP, RL, learn-to-plan) and knowledge representation (symbolic, subsymbolic, hybrid)—and argues that neurosymbolic AI, which combines planning with learning and symbolic-subsymbolic representations, is the most promising route toward an ideal SDM. The analysis highlights five desirable properties for SDM methods (applicability, ease of use, efficiency, interpretability, generalizability) and advocates integrating planning and learning to achieve these goals, especially in discrete $MDP$/$POMDP$ settings. Overall, the work provides a comprehensive framework and roadmap for developing unified, interpretable, and data-efficient SDM systems that leverage the strengths of both symbolic planning and deep learning.

Abstract

In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this article reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic, or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

TL;DR

SDM spans AP and RL, each with strengths in planning and learning but limitations in data efficiency, generalization, and interpretability. The paper surveys symbolic, subsymbolic, and hybrid SDM methods, and introduces Learn-to-Plan as a bridge between AP and RL, while also detailing approaches to learn SDP structure (action models and domain knowledge). It offers a two-dimensional taxonomy—solution method (AP, RL, learn-to-plan) and knowledge representation (symbolic, subsymbolic, hybrid)—and argues that neurosymbolic AI, which combines planning with learning and symbolic-subsymbolic representations, is the most promising route toward an ideal SDM. The analysis highlights five desirable properties for SDM methods (applicability, ease of use, efficiency, interpretability, generalizability) and advocates integrating planning and learning to achieve these goals, especially in discrete / settings. Overall, the work provides a comprehensive framework and roadmap for developing unified, interpretable, and data-efficient SDM systems that leverage the strengths of both symbolic planning and deep learning.

Abstract

In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this article reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic, or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.
Paper Structure (26 sections, 16 figures, 2 tables)

This paper contains 26 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Number of publications that integrate AP and RL. This figure was obtained by introducing the following query in Scopus (search performed on February 5 2024): TITLE-ABS-KEY ( ( "reinforcement learning" AND "automated planning" ) OR ( "model-based reinforcement learning" OR "model-based RL" ) OR ( "relational reinforcement learning" OR "relational RL" ) OR ( "automated planning" AND ( "machine learning" OR "deep learning" ) ) OR ( "learn to plan" OR "learning to plan" ) OR ( neurosymbolic OR neuro-symbolic OR neuralsymbolic OR neural-symbolic ) ) AND ( LIMIT-TO ( SUBJAREA , "COMP" ) OR LIMIT-TO ( SUBJAREA , "MATH" ) OR LIMIT-TO ( SUBJAREA , "ENGI" ) ) AND PUBYEAR > 1979 AND PUBYEAR < 2024 . © Elsevier B.V.
  • Figure 2: Comparison between complete policies, partial policies and plans. Nodes in the image represent MDP states, and arrows show the possible transitions between them. A complete policy is defined for the entire MDP state space $S$ (blue square in the picture). A partial policy is defined for a subset $S' \subset S$ of states (dashed green area in the picture). Finally, a plan only stores the action to execute for the states $S" \subset S'$ of a single trajectory from $s_i$ to some $s_g$ (gold-colored nodes with bold emphasis in the picture).
  • Figure 3: Proposed taxonomy of methods to solve MDPs. Model-free and model-based RL methods with a symbolic or hybrid knowledge representation are placed in the Relational Reinforcement Learning category.
  • Figure 4: CP task encoded using PDDL. The task belongs to the PDDL domain known as blocksworld, consisting of blocks that can be stacked one upon another with a gripper arm. The blue arrow represents a plan that achieves the goal state (right) starting from the initial state (left).
  • Figure 5: Planning with a heuristic. The figure illustrates how heuristics help reduce planning effort. For simplicity, we depict the case where the MDP is deterministic and search is carried out from the initial state $s_i$ to the goal $s_g$. When no heuristic is employed, the planning algorithm needs to explore the state space in all directions until $s_g$ is finally found (see blue circle in the image). A heuristic can prevent this by providing guidance and reducing the number of states that are explored (green ellipse in the image). Finally, if this heuristic is optimal/perfect (i.e., it predicts the optimal cost $V^*(s)$ for every state $s \in S$), only those states on the optimal plan(s) from $s_i$ to $s_g$ need to be explored. Analogously, for stochastic MDPs, the only states that need to be explored are those reachable from $s_i$ by following the optimal policy $\pi^*$.
  • ...and 11 more figures