Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

Badi Li; Ren-jie Lu; Yu Zhou; Jingke Meng; Wei-shi Zheng

Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

Badi Li, Ren-jie Lu, Yu Zhou, Jingke Meng, Wei-shi Zheng

TL;DR

GOAL tackles ObjectGoal Navigation by addressing uncertainty in unseen indoor layouts with a generative flow model that is primed by LLM-derived priors. It distills contextual knowledge into data-dependent couplings between partial and full semantic maps, enabling the agent to imagine plausible unobserved regions and select informative long-horizon waypoints. The approach combines 3D scene understanding, scene segmentation, and flow-based completion trained via optimal-transport interpolation, achieving state-of-the-art results on Gibson and MP3D and strong cross-dataset transfer to HM3D. This work demonstrates that integrating structured LLM guidance with fast-flow sampling yields better generalization for embodied agents in unseen environments, with practical impact for robust navigation in real-world robotics and AI assistants.

Abstract

The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.

Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

TL;DR

Abstract

Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)