Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning
Vittorio Giammarino, Ruiqi Ni, Ahmed H. Qureshi
TL;DR
This work tackles offline goal-conditioned reinforcement learning by addressing the fundamental challenge of estimating accurate goal-conditioned value functions from limited data. It introduces a physics-informed Eikonal regularizer that enforces a distance-like cost-to-go structure on the value function, derived from the Eikonal PDE and grounded in continuous-time optimal control. The regularizer is model-free, TD-compatible, and integrates seamlessly with Hierarchical Implicit Q-Learning to form Eik-HIQL, which achieves state-of-the-art results in large-scale navigation and trajectory stitching on OGbench, while remaining lightweight in computation. While offering clear benefits for navigation tasks, the approach shows limited gains in interactive, contact-rich domains, suggesting future work on task-adaptive speed profiles and modeling of contact dynamics to broaden applicability.
Abstract
Offline Goal-Conditioned Reinforcement Learning (GCRL) holds great promise for domains such as autonomous navigation and locomotion, where collecting interactive data is costly and unsafe. However, it remains challenging in practice due to the need to learn from datasets with limited coverage of the state-action space and to generalize across long-horizon tasks. To improve on these challenges, we propose a \emph{Physics-informed (Pi)} regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE) and which induces a geometric inductive bias in the learned value function. Unlike generic gradient penalties that are primarily used to stabilize training, our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures. The proposed regularizer is broadly compatible with temporal-difference-based value learning and can be integrated into existing Offline GCRL algorithms. When combined with Hierarchical Implicit Q-Learning (HIQL), the resulting method, Eikonal-regularized HIQL (Eik-HIQL), yields significant improvements in both performance and generalization, with pronounced gains in stitching regimes and large-scale navigation tasks.
