Active Information Gathering for Long-Horizon Navigation Under Uncertainty by Learning the Value of Information
Raihan Islam Arnob, Gregory J. Stein
TL;DR
This work tackles long-horizon point-goal navigation under uncertainty in partially mapped environments by inferring the long-horizon value of information $V_I$ for exploratory actions and integrating it into a Learning over Subgoals (LSP) planning framework. A Graph Neural Network estimates subgoal properties and the value of information, enabling the planner to actively seek information that improves planning performance while maintaining completeness and soundness. The approach computes one-step information gains $v_I$ and accumulates them as $V_I$ during offline training to train the estimator, and then applies these predictions at deployment time. In three simulated office-like environments, the method significantly reduces average navigation cost compared with non-learned baselines and prior LSP variants, with improvements up to 63.76% and full goal-reachability, demonstrating practical benefits of principled information-seeking for long-horizon navigation under uncertainty.
Abstract
We address the task of long-horizon navigation in partially mapped environments for which active gathering of information about faraway unseen space is essential for good behavior. We present a novel planning strategy that, at training time, affords tractable computation of the value of information associated with revealing potentially informative regions of unseen space, data used to train a graph neural network to predict the goodness of temporally-extended exploratory actions. Our learning-augmented model-based planning approach predicts the expected value of information of revealing unseen space and is capable of using these predictions to actively seek information and so improve long-horizon navigation. Across two simulated office-like environments, our planner outperforms competitive learned and non-learned baseline navigation strategies, achieving improvements of up to 63.76% and 36.68%, demonstrating its capacity to actively seek performance-critical information.
