LAMP: Implicit Language Map for Robot Navigation
Sibaek Lee, Hyeonwoo Yu, Giseop Kim, Sunwook Choi
TL;DR
LAMP addresses scalable zero-shot navigation by replacing explicit language maps with a neural implicit language field that maps poses to language embeddings using RGB inputs. The approach combines a sparse topological graph for coarse planning with gradient-based refinement in a Bayesian, von Mises–Fisher–based embedding space to achieve fine-grained goal localization; node sampling further reduces computation by prioritizing informative viewpoints and uncertainty. Key contributions include (i) the first implicit language map for navigation, (ii) a Bayesian treatment of embedding uncertainty on the unit sphere, and (iii) a graph-sampling strategy guided by language features and uncertainty, enabling large-scale, memory-efficient planning. Experiments in NVIDIA Isaac Sim and a real multi-floor building show LAMP outperforms explicit grid- and node-based methods in memory efficiency and fine-grained goal-reaching, demonstrating robust zero-shot navigation with RGB input even for unobserved targets.
Abstract
Recent advances in vision-language models have made zero-shot navigation feasible, enabling robots to follow natural language instructions without requiring labeling. However, existing methods that explicitly store language vectors in grid or node-based maps struggle to scale to large environments due to excessive memory requirements and limited resolution for fine-grained planning. We introduce LAMP (Language Map), a novel neural language field-based navigation framework that learns a continuous, language-driven map and directly leverages it for fine-grained path generation. Unlike prior approaches, our method encodes language features as an implicit neural field rather than storing them explicitly at every location. By combining this implicit representation with a sparse graph, LAMP supports efficient coarse path planning and then performs gradient-based optimization in the learned field to refine poses near the goal. This coarse-to-fine pipeline, language-driven, gradient-guided optimization is the first application of an implicit language map for precise path generation. This refinement is particularly effective at selecting goal regions not directly observed by leveraging semantic similarities in the learned feature space. To further enhance robustness, we adopt a Bayesian framework that models embedding uncertainty via the von Mises-Fisher distribution, thereby improving generalization to unobserved regions. To scale to large environments, LAMP employs a graph sampling strategy that prioritizes spatial coverage and embedding confidence, retaining only the most informative nodes and substantially reducing computational overhead. Our experimental results, both in NVIDIA Isaac Sim and on a real multi-floor building, demonstrate that LAMP outperforms existing explicit methods in both memory efficiency and fine-grained goal-reaching accuracy.
