Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Till Hofmann; Hector Geffner

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Till Hofmann, Hector Geffner

TL;DR

The paper addresses learning general policies for fully observable non-deterministic planning domains (FOND) by extending combinatorial, rule-based policy learning from classical planning to FOND. It leverages a correspondence to generalized classical planning on a determinized class $\mathcal{Q}_D$ and introduces safety constraints $\boldsymbol{B}$ to avoid dead-ends, formulating the learning task as a min-cost SAT problem $T(\mathcal{S},\mathcal{F})$ that selects a compact set of features and rules $R$. A dead-end detection procedure identifies dead-end states used to define $\boldsymbol{B}$, enabling the construction of a concrete policy $\pi_{R,B}$ that solves sampled FOND problems and, by safety, generalizes to a broader class. The approach is evaluated on standard FOND benchmarks, yielding general FOND policies in several domains and demonstrating correctness through descending, complete policies; a transition-constraint variant further improves performance on certain tireworld-like domains. Overall, the work provides a transparent, scalable method to obtain verifiable general policies for FOND planning by combining generalized classical planning with learned features and dead-end constraints, enabling practical applicability beyond toy instances.

Abstract

General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

TL;DR

and introduces safety constraints

to avoid dead-ends, formulating the learning task as a min-cost SAT problem

that selects a compact set of features and rules

. A dead-end detection procedure identifies dead-end states used to define

, enabling the construction of a concrete policy

that solves sampled FOND problems and, by safety, generalizes to a broader class. The approach is evaluated on standard FOND benchmarks, yielding general FOND policies in several domains and demonstrating correctness through descending, complete policies; a transition-constraint variant further improves performance on certain tireworld-like domains. Overall, the work provides a transparent, scalable method to obtain verifiable general policies for FOND planning by combining generalized classical planning with learned features and dead-end constraints, enabling practical applicability beyond toy instances.

Abstract

Paper Structure (28 sections, 4 equations, 3 tables, 2 algorithms)

This paper contains 28 sections, 4 equations, 3 tables, 2 algorithms.

Introduction
Related Work
Background
Classical planning
Generalized classical planning
FOND Planning
Dead-ends and deterministic relaxations
General Policies for FOND Planning
Semantical considerations
Expressing general FOND policies
Learning General FOND Policies
Min-Cost SAT Formulation
Dead-End Detection
Evaluation
Experimental Results
...and 13 more sections

Theorems & Definitions (5)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

TL;DR

Abstract

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Authors

TL;DR

Abstract

Table of Contents

Theorems & Definitions (5)