Table of Contents
Fetching ...

PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers

Wooju Lee, Juhye Park, Dasol Hong, Changki Sung, Youngwoo Seo, Dongwan Kang, Hyun Myung

TL;DR

PIDLoc tackles GNSS-denied cross-view localization by introducing a PID-inspired neural architecture that combines local, global, and fine-grained cues for robust 3-DoF pose estimation. It comprises three dedicated branches (P, I, D) to process cross-view feature differences and a Spatially Aware Pose Estimator (SPE) that encodes spatial relationships among PID features to yield consistent pose updates. The method leverages LiDAR-guided cross-view sampling, a shared-weight feature extractor, grid-search pose candidates, and gradient-based features to improve precision under large initial pose errors. Experiments on KITTI and FMAVS demonstrate state-of-the-art performance and strong generalization, highlighting the practical impact for autonomous systems in GNSS-challenged environments.

Abstract

Accurate localization is essential for autonomous driving, but GNSS-based methods struggle in challenging environments such as urban canyons. Cross-view pose optimization offers an effective solution by directly estimating vehicle pose using satellite-view images. However, existing methods primarily rely on cross-view features at a given pose, neglecting fine-grained contexts for precision and global contexts for robustness against large initial pose errors. To overcome these limitations, we propose PIDLoc, a novel cross-view pose optimization approach inspired by the proportional-integral-derivative (PID) controller. Using RGB images and LiDAR, the PIDLoc comprises the PID branches to model cross-view feature relationships and the spatially aware pose estimator (SPE) to estimate the pose from these relationships. The PID branches leverage feature differences for local context (P), aggregated feature differences for global context (I), and gradients of feature differences for precise pose adjustment (D) to enhance localization accuracy under large initial pose errors. Integrated with the PID branches, the SPE captures spatial relationships within the PID-branch features for consistent localization. Experimental results demonstrate that the PIDLoc achieves state-of-the-art performance in cross-view pose estimation for the KITTI dataset, reducing position error by $37.8\%$ compared with the previous state-of-the-art.

PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers

TL;DR

PIDLoc tackles GNSS-denied cross-view localization by introducing a PID-inspired neural architecture that combines local, global, and fine-grained cues for robust 3-DoF pose estimation. It comprises three dedicated branches (P, I, D) to process cross-view feature differences and a Spatially Aware Pose Estimator (SPE) that encodes spatial relationships among PID features to yield consistent pose updates. The method leverages LiDAR-guided cross-view sampling, a shared-weight feature extractor, grid-search pose candidates, and gradient-based features to improve precision under large initial pose errors. Experiments on KITTI and FMAVS demonstrate state-of-the-art performance and strong generalization, highlighting the practical impact for autonomous systems in GNSS-challenged environments.

Abstract

Accurate localization is essential for autonomous driving, but GNSS-based methods struggle in challenging environments such as urban canyons. Cross-view pose optimization offers an effective solution by directly estimating vehicle pose using satellite-view images. However, existing methods primarily rely on cross-view features at a given pose, neglecting fine-grained contexts for precision and global contexts for robustness against large initial pose errors. To overcome these limitations, we propose PIDLoc, a novel cross-view pose optimization approach inspired by the proportional-integral-derivative (PID) controller. Using RGB images and LiDAR, the PIDLoc comprises the PID branches to model cross-view feature relationships and the spatially aware pose estimator (SPE) to estimate the pose from these relationships. The PID branches leverage feature differences for local context (P), aggregated feature differences for global context (I), and gradients of feature differences for precise pose adjustment (D) to enhance localization accuracy under large initial pose errors. Integrated with the PID branches, the SPE captures spatial relationships within the PID-branch features for consistent localization. Experimental results demonstrate that the PIDLoc achieves state-of-the-art performance in cross-view pose estimation for the KITTI dataset, reducing position error by compared with the previous state-of-the-art.

Paper Structure

This paper contains 30 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: PIDLoc performs localization by incorporating local, global, and fine-grained contexts. In (a)-(c), the red, green, and blue circular sectors represent the current, ground-truth, and predicted pose, respectively. The blue and black arrows represent the position adjustment of the given pose during the single and final iteration, respectively. The yellow region represents the range of poses addressed by each branch. (a) Similar to existing methods, the P branch relies solely on the given pose, often converging to a local optimum. (b) The I branch incorporates global context from diverse poses, enabling robust pose estimation even under large initial pose errors. (c) The D branch leverages gradients of feature differences to perform fine-grained pose adjustments.
  • Figure 2: An overview of the PIDLoc. The PIDLoc iteratively updates the pose based on the cross-view features. The $\oplus$ and $\ominus$ denote concatenation and subtraction, respectively. The proposed method generates the PID-branch features $w_{}(\mathbf{P}_{})$ from cross-view feature differences $e(\mathbf{P}_{})=\bold{F}_s[\mathcal{I}_{s}(\mathbf{P}_{})]-\bold{F}_g[\mathcal{I}_{g}]$. The PID-branch features $w_{}(\mathbf{P}_{})$ guide the pose estimator SPE to converge accurately toward the ground-truth pose even under large initial pose errors.
  • Figure 3: The comparison of existing pose estimators and proposed SPE. $\Delta \textbf{p}^m$ is the pose difference of the point $m$, $\text{PE}$ is a positional embedding, $w_{}(\mathbf{P}_{})$ are PID-branch features, $\mathbf{P}_{} \textbf{x}$ are 3D satellite-view coordinates, and MLP $\Phi$ are channel-shared MLPs.
  • Figure 4: Ablation analysis of the PID branches under varying initial pose errors in the cross-area setting of the KITTI dataset.
  • Figure 5: Visualization of localization results. The red, green, and blue circular sectors represent the current, ground-truth, and predicted pose, respectively. The blue line represents the iterative trajectory of predicted poses. Compared with SIBCL wang2023satellite, PIDLoc converges to the global optimum in a challenging environment with repetitive patterns.