Table of Contents
Fetching ...

Learning Interpretable Classifiers for PDDL Planning

Arnaud Lequen

TL;DR

This work considers the problem of synthesizing interpretable models that recognize the behaviour of an agent compared to other agents, on a whole set of similar planning tasks expressed in PDDL, and proposes to learn these behaviour classifiers through a topology-guided compilation to MaxSAT, which allows for a wide range of different formulas.

Abstract

We consider the problem of synthesizing interpretable models that recognize the behaviour of an agent compared to other agents, on a whole set of similar planning tasks expressed in PDDL. Our approach consists in learning logical formulas, from a small set of examples that show how an agent solved small planning instances. These formulas are expressed in a version of First-Order Temporal Logic (FTL) tailored to our planning formalism. Such formulas are human-readable, serve as (partial) descriptions of an agent's policy, and generalize to unseen instances. We show that learning such formulas is computationally intractable, as it is an NP-hard problem. As such, we propose to learn these behaviour classifiers through a topology-guided compilation to MaxSAT, which allows us to generate a wide range of different formulas. Experiments show that interesting and accurate formulas can be learned in reasonable time.

Learning Interpretable Classifiers for PDDL Planning

TL;DR

This work considers the problem of synthesizing interpretable models that recognize the behaviour of an agent compared to other agents, on a whole set of similar planning tasks expressed in PDDL, and proposes to learn these behaviour classifiers through a topology-guided compilation to MaxSAT, which allows for a wide range of different formulas.

Abstract

We consider the problem of synthesizing interpretable models that recognize the behaviour of an agent compared to other agents, on a whole set of similar planning tasks expressed in PDDL. Our approach consists in learning logical formulas, from a small set of examples that show how an agent solved small planning instances. These formulas are expressed in a version of First-Order Temporal Logic (FTL) tailored to our planning formalism. Such formulas are human-readable, serve as (partial) descriptions of an agent's policy, and generalize to unseen instances. We show that learning such formulas is computationally intractable, as it is an NP-hard problem. As such, we propose to learn these behaviour classifiers through a topology-guided compilation to MaxSAT, which allows us to generate a wide range of different formulas. Experiments show that interesting and accurate formulas can be learned in reasonable time.

Paper Structure

This paper contains 37 sections, 1 theorem, 20 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Proposition 1

The decision problem associated to the $\mathcal{L}_{\text{FTL}}$ learning problem is NP-hard

Figures (1)

  • Figure 1: A TL chain example, which has been assigned symbols to its nodes. It represents the formula $(q(v,u) \wedge r(z, y)) \, \textsf{U} \, p(t, x)$

Theorems & Definitions (9)

  • Definition 1: Type tree
  • Definition 2: Object class
  • Definition 3: Type hierarchy
  • Definition 4: Predicate, atoms and fluents
  • Definition 5: Action schema and operators
  • Definition 6: PDDL planning problem
  • Definition 7: Instantiated trace
  • Proposition 1
  • proof : Proof of Proposition \ref{['prop:np_hardness']} (Sketch)