Table of Contents
Fetching ...

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Till Hofmann, Hector Geffner

TL;DR

The paper addresses learning general policies for fully observable non-deterministic planning domains (FOND) by extending combinatorial, rule-based policy learning from classical planning to FOND. It leverages a correspondence to generalized classical planning on a determinized class $\mathcal{Q}_D$ and introduces safety constraints $\boldsymbol{B}$ to avoid dead-ends, formulating the learning task as a min-cost SAT problem $T(\mathcal{S},\mathcal{F})$ that selects a compact set of features and rules $R$. A dead-end detection procedure identifies dead-end states used to define $\boldsymbol{B}$, enabling the construction of a concrete policy $\pi_{R,B}$ that solves sampled FOND problems and, by safety, generalizes to a broader class. The approach is evaluated on standard FOND benchmarks, yielding general FOND policies in several domains and demonstrating correctness through descending, complete policies; a transition-constraint variant further improves performance on certain tireworld-like domains. Overall, the work provides a transparent, scalable method to obtain verifiable general policies for FOND planning by combining generalized classical planning with learned features and dead-end constraints, enabling practical applicability beyond toy instances.

Abstract

General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

TL;DR

The paper addresses learning general policies for fully observable non-deterministic planning domains (FOND) by extending combinatorial, rule-based policy learning from classical planning to FOND. It leverages a correspondence to generalized classical planning on a determinized class and introduces safety constraints to avoid dead-ends, formulating the learning task as a min-cost SAT problem that selects a compact set of features and rules . A dead-end detection procedure identifies dead-end states used to define , enabling the construction of a concrete policy that solves sampled FOND problems and, by safety, generalizes to a broader class. The approach is evaluated on standard FOND benchmarks, yielding general FOND policies in several domains and demonstrating correctness through descending, complete policies; a transition-constraint variant further improves performance on certain tireworld-like domains. Overall, the work provides a transparent, scalable method to obtain verifiable general policies for FOND planning by combining generalized classical planning with learned features and dead-end constraints, enabling practical applicability beyond toy instances.

Abstract

General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.
Paper Structure (28 sections, 4 equations, 3 tables, 2 algorithms)