F$^3$Loc: Fusion and Filtering for Floorplan Localization
Changan Chen, Rui Wang, Christoph Vogel, Marc Pollefeys
TL;DR
This work tackles indoor camera localization with respect to floorplans without requiring per-map retraining or large image databases. It combines monocular and multi-view floorplan depth predictions through a learned complementary selector and integrates evidence over time with an efficient SE(2) histogram filter, enabling real-time sequential localization on consumer hardware. Key contributions include a novel 1D ray floorplan representation, depth extraction from single and multi-view inputs, a learned fusion mechanism, virtual roll-pitch augmentation, an SE(2) histogram filter for rapid sequential inference, and a large iGibson-based dataset plus a real-world LaMAR demonstration showing scalable, accurate localization. The results show significant improvements in recall and localization speed over state-of-the-art methods, with practical implications for indoor AR/VR and robot autonomy.
Abstract
In this paper we propose an efficient data-driven solution to self-localization within a floorplan. Floorplan data is readily available, long-term persistent and inherently robust to changes in the visual appearance. Our method does not require retraining per map and location or demand a large database of images of the area of interest. We propose a novel probabilistic model consisting of an observation and a novel temporal filtering module. Operating internally with an efficient ray-based representation, the observation module consists of a single and a multiview module to predict horizontal depth from images and fuses their results to benefit from advantages offered by either methodology. Our method operates on conventional consumer hardware and overcomes a common limitation of competing methods that often demand upright images. Our full system meets real-time requirements, while outperforming the state-of-the-art by a significant margin.
