Table of Contents
Fetching ...

SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames

Yuxuan Zhou, Xingxing Li, Shengyu Li, Chunxi Xia, Xuanbin Wang, Shaoquan Feng

TL;DR

SF-Loc tackles the challenge of reliable, large-scale geo-localization with lightweight maps by introducing sparse visual structure frames that compress image data and dense depth. The method combines multi-sensor dense bundle adjustment (MS-DBA) for accurate mapping and a coarse-to-fine localization pipeline that leverages spatially smoothed similarity (SSS) and spatiotemporally associated similarity (SAS) to fuse multi-frame cues. Experimental results on cross-season urban data show decimeter-level re-localization with a map size of about $3$ MB/km and strong coarse-to-fine localization performance, validating the approach under GNSS outages and appearance changes. The work demonstrates practical real-time viability (≈110 ms per frame) and points to open-source release, highlighting potential impact for robotics and autonomous systems requiring robust, scalable map-aided localization.

Abstract

For high-level geo-spatial applications and intelligent robotics, accurate global pose information is of crucial importance. Map-aided localization is a universal approach to overcome the limitations of global navigation satellite system (GNSS) in challenging environments. However, current solutions face challenges in terms of mapping flexibility, storage burden and re-localization performance. In this work, we present SF-Loc, a lightweight visual mapping and map-aided localization system, whose core idea is the map representation based on sparse frames with dense but compact depth, termed as visual structure frames. In the mapping phase, multi-sensor dense bundle adjustment (MS-DBA) is applied to construct geo-referenced visual structure frames. The local co-visbility is checked to keep the map sparsity and achieve incremental mapping. In the localization phase, coarse-to-fine vision-based localization is performed, in which multi-frame information and the map distribution are fully integrated. To be specific, the concept of spatially smoothed similarity (SSS) is proposed to overcome the place ambiguity, and pairwise frame matching is applied for efficient and robust pose estimation. Experimental results on the cross-season dataset verify the effectiveness of the system. In complex urban road scenarios, the map size is down to 3 MB per kilometer and stable decimeter-level re-localization can be achieved. The code will be made open-source soon (https://github.com/GREAT-WHU/SF-Loc).

SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames

TL;DR

SF-Loc tackles the challenge of reliable, large-scale geo-localization with lightweight maps by introducing sparse visual structure frames that compress image data and dense depth. The method combines multi-sensor dense bundle adjustment (MS-DBA) for accurate mapping and a coarse-to-fine localization pipeline that leverages spatially smoothed similarity (SSS) and spatiotemporally associated similarity (SAS) to fuse multi-frame cues. Experimental results on cross-season urban data show decimeter-level re-localization with a map size of about MB/km and strong coarse-to-fine localization performance, validating the approach under GNSS outages and appearance changes. The work demonstrates practical real-time viability (≈110 ms per frame) and points to open-source release, highlighting potential impact for robotics and autonomous systems requiring robust, scalable map-aided localization.

Abstract

For high-level geo-spatial applications and intelligent robotics, accurate global pose information is of crucial importance. Map-aided localization is a universal approach to overcome the limitations of global navigation satellite system (GNSS) in challenging environments. However, current solutions face challenges in terms of mapping flexibility, storage burden and re-localization performance. In this work, we present SF-Loc, a lightweight visual mapping and map-aided localization system, whose core idea is the map representation based on sparse frames with dense but compact depth, termed as visual structure frames. In the mapping phase, multi-sensor dense bundle adjustment (MS-DBA) is applied to construct geo-referenced visual structure frames. The local co-visbility is checked to keep the map sparsity and achieve incremental mapping. In the localization phase, coarse-to-fine vision-based localization is performed, in which multi-frame information and the map distribution are fully integrated. To be specific, the concept of spatially smoothed similarity (SSS) is proposed to overcome the place ambiguity, and pairwise frame matching is applied for efficient and robust pose estimation. Experimental results on the cross-season dataset verify the effectiveness of the system. In complex urban road scenarios, the map size is down to 3 MB per kilometer and stable decimeter-level re-localization can be achieved. The code will be made open-source soon (https://github.com/GREAT-WHU/SF-Loc).

Paper Structure

This paper contains 18 sections, 18 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Illustration of the SF-Loc system. The system is built upon the map representation of visual structure frames, which contain compressed image, compact depth information and global descriptor. The map sparsity is intentionally maintained, which ensures lightweight storage ($\approx$ 3 MB/km) while keeps the ability of high-recall, decimeter level re-localization.
  • Figure 2: The overall pipeline of the system, which is divided into the mapping phase and the localization phase.
  • Figure 3: Illustration of the proposed multi-sensor DBA. 1) A sliding window factor graph is used for real-time state estimation and depth estimation, which is tightly integrated with the recurrent optical flow module. 2) The global factor graph collects the marginalized factors and provides low-frequency, long-time smoothed optimization results. 3) After mature, the frames in the global factor graph are decoupled to serve as standalone visual structure frames and inserted into the map.
  • Figure 4: Illustration of the co-visibility checking. (a) Co-visibility for one image pair; (b) Local co-visibility checking, the dashed-line frame would be added/discarded based on its score comparison with the high-covisbility map frame; (c) An example of the sparsified structure frame map.
  • Figure 5: Illustration of the multi-frame place recognition. Particles that indicate the query pose are set around map frames. The user-side trajectory is used for nearest searching to associate with the map frames, then the SAS distance is evaluated, which combines the information of multiple image pairs.
  • ...and 9 more figures