Table of Contents
Fetching ...

RADA: Robust and Accurate Feature Learning with Domain Adaptation

Jingtai He, Gehao Zhang, Tingting Liu, Songlin Du

TL;DR

RADA tackles the challenge of robust local feature learning under severe domain shifts by integrating domain adaptation supervision with a Transformer-based booster. The backbone learns multi-scale features while aligning high-level representations across domains, and the Wave Position Encoder paired with an Attention-Free Transformer integrates global context into descriptors. The framework is guided by targeted losses for detection, description, and their coupling, enabling end-to-end optimization. Empirical results on HPatches and Aachen Day-Night demonstrate improved matching accuracy and localization performance, highlighting the method's practical impact for tasks like visual localization and SfM under changing conditions.

Abstract

Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to facilitate the learning of robust and accurate features with domain adaptation. First, we employ domain adaptation supervision to align high-level feature distributions across different domains to achieve invariant domain representations. Second, we propose a Transformer-based booster that enhances descriptor robustness by integrating visual and geometric information through wave position encoding concepts, effectively handling complex conditions. To ensure the accuracy and robustness of features, we adopt a hierarchical architecture to capture comprehensive information and apply meticulous targeted supervision to keypoint detection, descriptor extraction, and their coupled processing. Extensive experiments demonstrate that our method, RADA, achieves excellent results in image matching, camera pose estimation, and visual localization tasks.

RADA: Robust and Accurate Feature Learning with Domain Adaptation

TL;DR

RADA tackles the challenge of robust local feature learning under severe domain shifts by integrating domain adaptation supervision with a Transformer-based booster. The backbone learns multi-scale features while aligning high-level representations across domains, and the Wave Position Encoder paired with an Attention-Free Transformer integrates global context into descriptors. The framework is guided by targeted losses for detection, description, and their coupling, enabling end-to-end optimization. Empirical results on HPatches and Aachen Day-Night demonstrate improved matching accuracy and localization performance, highlighting the method's practical impact for tasks like visual localization and SfM under changing conditions.

Abstract

Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to facilitate the learning of robust and accurate features with domain adaptation. First, we employ domain adaptation supervision to align high-level feature distributions across different domains to achieve invariant domain representations. Second, we propose a Transformer-based booster that enhances descriptor robustness by integrating visual and geometric information through wave position encoding concepts, effectively handling complex conditions. To ensure the accuracy and robustness of features, we adopt a hierarchical architecture to capture comprehensive information and apply meticulous targeted supervision to keypoint detection, descriptor extraction, and their coupled processing. Extensive experiments demonstrate that our method, RADA, achieves excellent results in image matching, camera pose estimation, and visual localization tasks.
Paper Structure (28 sections, 16 equations, 4 figures, 5 tables)

This paper contains 28 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Visualization of the detected keypoints and the matches using our RADA on the Megadepth li2018megadepth validation set. (a) Keypoints detection from the source domain (daytime). (b) Keypoints detection from the target domain (nighttime). (c) The matching results between image pairs from different source and target domains. The green lines show correct correspondences.
  • Figure 2: Network architecture. Our RADA consists of three components: keypoint detection and descriptor extraction backbone to learn accurate local features from hierarchical feature maps, domain adaptation supervision to achieve domain-invariant representations, and a Transformer-based booster to improve the robustness of descriptors. The cross-domain training image pairs with ground truth are produced from Megadepth li2018megadepth.
  • Figure 3: Clear illustration of the two modules. (a) Domain adaptation is achieved by introducing two branched tasks after the feature map. (b) Wave-PE combines the amplitude and the phase encoded by the information of local features.
  • Figure 4: Visualization of matches on Aachen Day-Night zhang2021reference. The color-coded inliers: green for correct matches (reprojection error within 0 to 5 pixels), red for incorrect matches (exceeding 5 pixels), blue for unavailable ground truth depth.