Goal Recognition using Actor-Critic Optimization

Ben Nageris; Felipe Meneguzzi; Reuth Mirsky

Goal Recognition using Actor-Critic Optimization

Ben Nageris, Felipe Meneguzzi, Reuth Mirsky

TL;DR

DRACO introduces a deep RL-based framework for goal recognition that learns per-goal policies from unstructured data and infers goals by comparing observations to each policy. It proposes two observation-distance metrics, Wasserstein-based and Z-score-based, to compute $p(O|g)$ and derive $p(g|O)$ via softmin and Bayes. The method achieves state-of-the-art performance in both discrete (MiniGrid) and continuous (Panda-Gym) domains while using substantially less memory and computation than symbolic or tabular baselines. Limitations include the need for environment simulations to train policies and the requirement to learn a policy per potential goal; future work includes imitation learning and transfer learning to broaden goals.

Abstract

Goal Recognition aims to infer an agent's goal from a sequence of observations. Existing approaches often rely on manually engineered domains and discrete representations. Deep Recognition using Actor-Critic Optimization (DRACO) is a novel approach based on deep reinforcement learning that overcomes these limitations by providing two key contributions. First, it is the first goal recognition algorithm that learns a set of policy networks from unstructured data and uses them for inference. Second, DRACO introduces new metrics for assessing goal hypotheses through continuous policy representations. DRACO achieves state-of-the-art performance for goal recognition in discrete settings while not using the structured inputs used by existing approaches. Moreover, it outperforms these approaches in more challenging, continuous settings at substantially reduced costs in both computing and memory. Together, these results showcase the robustness of the new algorithm, bridging traditional goal recognition and deep reinforcement learning.

Goal Recognition using Actor-Critic Optimization

TL;DR

and derive

via softmin and Bayes. The method achieves state-of-the-art performance in both discrete (MiniGrid) and continuous (Panda-Gym) domains while using substantially less memory and computation than symbolic or tabular baselines. Limitations include the need for environment simulations to train policies and the requirement to learn a policy per potential goal; future work includes imitation learning and transfer learning to broaden goals.

Abstract

Paper Structure (13 sections, 9 equations, 4 figures, 2 tables)

This paper contains 13 sections, 9 equations, 4 figures, 2 tables.

Introduction
Background
Recognition Problems
Planning Domain Definition Language (PDDL)
Reinforcement Learning
DRACO
Policy Learning
Likelihood Estimation of Observations (inference)
Observation distance functions
Empirical Setup and Testbed
Results
Related Work
Conclusion

Figures (4)

Figure 1: Overview of DRACO.
Figure 2: Z-score-based computation process: (i) input extraction; (ii) separately, calculate the Z-score of the agent's movement and the observation. (iii), return the average of calculated Z-scores.
Figure 3: Example for domain setups used in evaluation.
Figure 4: Panda-gym domain results of F-score for GRAQL, DRACO with Z-score, and DRACO with Wasserstein, observability (a), and noise (b). Error bars indicate standard deviation.

Theorems & Definitions (3)

Definition 1
Definition 2
Definition 3

Goal Recognition using Actor-Critic Optimization

TL;DR

Abstract

Goal Recognition using Actor-Critic Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (3)