Table of Contents
Fetching ...

Towards a Unified Framework for Sequential Decision Making

Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares

TL;DR

The paper addresses the lack of a unified theory for Sequential Decision Making (SDM) that covers Automated Planning (AP), Reinforcement Learning (RL), and hybrids. It introduces Constrained Stochastic Shortest Path MDPs (CSSP-MDPs) and a training/test MDP formulation to model generalization, along with Context-Aware policies that can access MDP context $\mu$. A general Bayesian-inspired algorithm is proposed to iteratively refine the solution distribution $P^*(\Pi)$ by sampling policies, scoring them, updating beliefs, and propagating information to similar policies, with formal measures of task difficulty, knowledge quantity, efficiency, and quality. The framework enables systematic evaluation and comparison across SDM methods and sets the stage for empirical cross-domain studies and hybrid AP/RL competitions. The work offers practical tools for quantifying SDM task properties and provides a path toward a unified theory of SDM that encompasses both planning and learning-based approaches.

Abstract

In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from Classical Planning to Deep RL, by drawing on concepts from Probability Theory and Bayesian inference. We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs), to account for generalization. We provide a general algorithm for SDM which we hypothesize every SDM method is based on. According to it, every SDM algorithm can be seen as a procedure that iteratively improves its solution estimate by leveraging the task knowledge available. Finally, we derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, which make possible their empirical evaluation and comparison.

Towards a Unified Framework for Sequential Decision Making

TL;DR

The paper addresses the lack of a unified theory for Sequential Decision Making (SDM) that covers Automated Planning (AP), Reinforcement Learning (RL), and hybrids. It introduces Constrained Stochastic Shortest Path MDPs (CSSP-MDPs) and a training/test MDP formulation to model generalization, along with Context-Aware policies that can access MDP context . A general Bayesian-inspired algorithm is proposed to iteratively refine the solution distribution by sampling policies, scoring them, updating beliefs, and propagating information to similar policies, with formal measures of task difficulty, knowledge quantity, efficiency, and quality. The framework enables systematic evaluation and comparison across SDM methods and sets the stage for empirical cross-domain studies and hybrid AP/RL competitions. The work offers practical tools for quantifying SDM task properties and provides a path toward a unified theory of SDM that encompasses both planning and learning-based approaches.

Abstract

In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from Classical Planning to Deep RL, by drawing on concepts from Probability Theory and Bayesian inference. We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs), to account for generalization. We provide a general algorithm for SDM which we hypothesize every SDM method is based on. According to it, every SDM algorithm can be seen as a procedure that iteratively improves its solution estimate by leveraging the task knowledge available. Finally, we derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, which make possible their empirical evaluation and comparison.
Paper Structure (20 sections, 3 equations)