Signage-Aware Exploration in Open World using Venue Maps
Chang Chen, Liang Lu, Lei Yang, Yinqiang Zhang, Yizhou Chen, Ruixing Jia, Jia Pan
TL;DR
The paper tackles locating landmarks in unknown open-world environments by leveraging 2D venue maps and scene signage. It introduces a signage understanding pipeline based on diffusion-driven text instance retrieval and 2D-to-3D fusion to robustly recognize signage with arbitrary shapes, coupled with a venue-map-guided exploration-exploitation planner that balances exploration of unknown areas with exploitation to approach and orient toward signs. The key contributions are the topological planning on venue maps, the diffusion-based signage retrieval, and the integrated planning framework that yields higher signage coverage and faster search in large-scale malls, outperforming state-of-the-art text spotting and traditional exploration baselines. The approach demonstrates practical improvements in navigation efficiency and landmark localization, highlighting the value of grounding text-level cues in non-metric venue maps for robust real-world exploration.
Abstract
Current exploration methods struggle to search for shops or restaurants in unknown open-world environments due to the lack of prior knowledge. Humans can leverage venue maps that offer valuable scene priors to aid exploration planning by correlating the signage in the scene with landmark names on the map. However, arbitrary shapes and styles of the texts on signage, along with multi-view inconsistencies, pose significant challenges for robots to recognize them accurately. Additionally, discrepancies between real-world environments and venue maps hinder the integration of text-level information into the planners. This paper introduces a novel signage-aware exploration system to address these challenges, enabling the robots to utilize venue maps effectively. We propose a signage understanding method that accurately detects and recognizes the texts on signage using a diffusion-based text instance retrieval method combined with a 2D-to-3D semantic fusion strategy. Furthermore, we design a venue map-guided exploration-exploitation planner that balances exploration in unknown regions using directional heuristics derived from venue maps and exploitation to get close and adjust orientation for better recognition. Experiments in large-scale shopping malls demonstrate our method's superior signage recognition performance and search efficiency, surpassing state-of-the-art text spotting methods and traditional exploration approaches. Project website: https://sites.google.com/view/signage-aware-exploration.
