Table of Contents
Fetching ...

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang

TL;DR

LoD-Loc achieves excellent performance, even surpassing current state-of-the-art methods that use textured 3D models for localization, and devises a 6-DoF pose optimization algorithm to refine the previous result with a differentiable Gaussian-Newton method.

Abstract

We propose a new method named LoD-Loc for visual localization in the air. Unlike existing localization algorithms, LoD-Loc does not rely on complex 3D representations and can estimate the pose of an Unmanned Aerial Vehicle (UAV) using a Level-of-Detail (LoD) 3D map. LoD-Loc mainly achieves this goal by aligning the wireframe derived from the LoD projected model with that predicted by the neural network. Specifically, given a coarse pose provided by the UAV sensor, LoD-Loc hierarchically builds a cost volume for uniformly sampled pose hypotheses to describe pose probability distribution and select a pose with maximum probability. Each cost within this volume measures the degree of line alignment between projected and predicted wireframes. LoD-Loc also devises a 6-DoF pose optimization algorithm to refine the previous result with a differentiable Gaussian-Newton method. As no public dataset exists for the studied problem, we collect two datasets with map levels of LoD3.0 and LoD2.0, along with real RGB queries and ground-truth pose annotations. We benchmark our method and demonstrate that LoD-Loc achieves excellent performance, even surpassing current state-of-the-art methods that use textured 3D models for localization. The code and dataset are available at https://victorzoo.github.io/LoD-Loc.github.io/.

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

TL;DR

LoD-Loc achieves excellent performance, even surpassing current state-of-the-art methods that use textured 3D models for localization, and devises a 6-DoF pose optimization algorithm to refine the previous result with a differentiable Gaussian-Newton method.

Abstract

We propose a new method named LoD-Loc for visual localization in the air. Unlike existing localization algorithms, LoD-Loc does not rely on complex 3D representations and can estimate the pose of an Unmanned Aerial Vehicle (UAV) using a Level-of-Detail (LoD) 3D map. LoD-Loc mainly achieves this goal by aligning the wireframe derived from the LoD projected model with that predicted by the neural network. Specifically, given a coarse pose provided by the UAV sensor, LoD-Loc hierarchically builds a cost volume for uniformly sampled pose hypotheses to describe pose probability distribution and select a pose with maximum probability. Each cost within this volume measures the degree of line alignment between projected and predicted wireframes. LoD-Loc also devises a 6-DoF pose optimization algorithm to refine the previous result with a differentiable Gaussian-Newton method. As no public dataset exists for the studied problem, we collect two datasets with map levels of LoD3.0 and LoD2.0, along with real RGB queries and ground-truth pose annotations. We benchmark our method and demonstrate that LoD-Loc achieves excellent performance, even surpassing current state-of-the-art methods that use textured 3D models for localization. The code and dataset are available at https://victorzoo.github.io/LoD-Loc.github.io/.

Paper Structure

This paper contains 34 sections, 16 equations, 19 figures, 14 tables.

Figures (19)

  • Figure 1: In this paper, we propose LoD-Loc to tackle visual localization w.r.t a scene represented by a LoD 3D map, characterized by its ease of acquisition, lightweight nature, and built-in privacy-preserving capabilities. Given a query image and its coarse sensor pose, our method utilizes the wireframe alignment of LoD models to recover the camera pose.
  • Figure 2: Overview of datasets. The left side shows the LoD models of the released data. The LoD2.0 model from Swiss-EPFL includes building height and roof information, while the LoD3.0 model from UAVD4L-LoD contains more detailed structural information such as building height, roof, and side pillars. The right side illustrates samples of query images, which consist of images captured by drones in various scenes.
  • Figure 3: Overview of LoD-Loc. 1. LoD-Loc employs a CNN to extract multi-level features $\mathbf{F}_l$ for the query image $\mathbf{I}$ (Sec. \ref{['subsec_featureExtract']}). 2. A cost volume $\mathcal{C}_l$ is built for various pose hypotheses sampled around the coarse sensor pose $\boldsymbol{\mathcal{\xi}}_p$ to select the pose $\boldsymbol{\mathcal{\xi}}_l$ with the highest probability, based on the projected wireframe of the 3D LoD model (Sec. \ref{['subsec_Cost_Volume']}). 3. A differentiable Gauss-Newton method is used to refine the final selected pose $\boldsymbol{\mathcal{\xi}}_3$, to obtain a more accurate pose $\boldsymbol{\mathcal{\xi}}^{*}$ (Sec. \ref{['subsec_Pose_Optimi']}).
  • Figure 4: Toy examples to illustrate the uncertainty sampling range estimation. We show pose distribution (connected blue dots), pose prediction (yellow dash line), the ground truth pose (red dash line), and uncertainty sampling range (gray) in the three levels.
  • Figure 5: Visualization of feature maps from different levels. The feature maps of different levels reflect different fineness of wireframe extraction.
  • ...and 14 more figures