LiLMaps: Learnable Implicit Language Maps
Evgenii Kruzhkov, Sven Behnke
TL;DR
LiLMaps addresses the challenge of coupling language understanding with 3D environment maps for autonomous robotics by learning implicit language representations alongside geometry in an incremental setting. It introduces a sparse octree-based implicit map with a compact per-voxel feature and a 3-layer MLP decoder that reconstructs per-point language features, trained with a vision-language cosine loss $L_{vl}$ and with decoder weights kept separate from this loss. To handle unseen language features and cross-view inconsistencies, LiLMaps adds adaptive language decoder optimization and a measurement update strategy using a weighted target $\varphi_n^*$ and exponential smoothing with $\alpha$ to adapt to noise. Experiments on Habitat with Matterport3D show LiLMaps outperforming baselines like OpenScene and VLMaps in 3D language mapping quality while running in real time (~4 fps) and supporting 3D language-based object detection. This work offers a robust, scalable approach to language-grounded mapping that can be integrated with existing implicit SLAM systems with minimal overhead.
Abstract
One of the current trends in robotics is to employ large language models (LLMs) to provide non-predefined command execution and natural human-robot interaction. It is useful to have an environment map together with its language representation, which can be further utilized by LLMs. Such a comprehensive scene representation enables numerous ways of interaction with the map for autonomously operating robots. In this work, we present an approach that enhances incremental implicit mapping through the integration of vision-language features. Specifically, we (i) propose a decoder optimization technique for implicit language maps which can be used when new objects appear on the scene, and (ii) address the problem of inconsistent vision-language predictions between different viewing positions. Our experiments demonstrate the effectiveness of LiLMaps and solid improvements in performance.
