Goal Recognition using Actor-Critic Optimization
Ben Nageris, Felipe Meneguzzi, Reuth Mirsky
TL;DR
DRACO introduces a deep RL-based framework for goal recognition that learns per-goal policies from unstructured data and infers goals by comparing observations to each policy. It proposes two observation-distance metrics, Wasserstein-based and Z-score-based, to compute $p(O|g)$ and derive $p(g|O)$ via softmin and Bayes. The method achieves state-of-the-art performance in both discrete (MiniGrid) and continuous (Panda-Gym) domains while using substantially less memory and computation than symbolic or tabular baselines. Limitations include the need for environment simulations to train policies and the requirement to learn a policy per potential goal; future work includes imitation learning and transfer learning to broaden goals.
Abstract
Goal Recognition aims to infer an agent's goal from a sequence of observations. Existing approaches often rely on manually engineered domains and discrete representations. Deep Recognition using Actor-Critic Optimization (DRACO) is a novel approach based on deep reinforcement learning that overcomes these limitations by providing two key contributions. First, it is the first goal recognition algorithm that learns a set of policy networks from unstructured data and uses them for inference. Second, DRACO introduces new metrics for assessing goal hypotheses through continuous policy representations. DRACO achieves state-of-the-art performance for goal recognition in discrete settings while not using the structured inputs used by existing approaches. Moreover, it outperforms these approaches in more challenging, continuous settings at substantially reduced costs in both computing and memory. Together, these results showcase the robustness of the new algorithm, bridging traditional goal recognition and deep reinforcement learning.
