Table of Contents
Fetching ...

Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne, Noah Snavely

TL;DR

Doppelgangers++ addresses visual aliasing in 3D reconstruction by replacing CNN-based doppelganger classification with a Transformer classifier that leverages geometric features from the MASt3R model. It introduces the VisymScenes dataset to diversify training beyond landmark photos and employs a dual-head, test-time voting mechanism to robustly distinguish true matches from doppelgangers. A geo-tag-based evaluation framework using Mapillary imagery enables automatic assessment of SfM correctness and model completeness. Across pairwise disambiguation and SfM reconstruction tasks, Doppelgangers++ demonstrates improved precision, recall, and inlier-based geometry accuracy, especially in diverse, everyday scenes. The approach integrates seamlessly into existing SfM pipelines and reduces sensitivity to threshold tuning, offering practical gains for robust 3D modeling in real-world environments.

Abstract

Accurate 3D reconstruction is frequently hindered by visual aliasing, where visually similar but distinct surfaces (aka, doppelgangers), are incorrectly matched. These spurious matches distort the structure-from-motion (SfM) process, leading to misplaced model elements and reduced accuracy. Prior efforts addressed this with CNN classifiers trained on curated datasets, but these approaches struggle to generalize across diverse real-world scenes and can require extensive parameter tuning. In this work, we present Doppelgangers++, a method to enhance doppelganger detection and improve 3D reconstruction accuracy. Our contributions include a diversified training dataset that incorporates geo-tagged images from everyday scenes to expand robustness beyond landmark-based datasets. We further propose a Transformer-based classifier that leverages 3D-aware features from the MASt3R model, achieving superior precision and recall across both in-domain and out-of-domain tests. Doppelgangers++ integrates seamlessly into standard SfM and MASt3R-SfM pipelines, offering efficiency and adaptability across varied scenes. To evaluate SfM accuracy, we introduce an automated, geotag-based method for validating reconstructed models, eliminating the need for manual inspection. Through extensive experiments, we demonstrate that Doppelgangers++ significantly enhances pairwise visual disambiguation and improves 3D reconstruction quality in complex and diverse scenarios.

Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

TL;DR

Doppelgangers++ addresses visual aliasing in 3D reconstruction by replacing CNN-based doppelganger classification with a Transformer classifier that leverages geometric features from the MASt3R model. It introduces the VisymScenes dataset to diversify training beyond landmark photos and employs a dual-head, test-time voting mechanism to robustly distinguish true matches from doppelgangers. A geo-tag-based evaluation framework using Mapillary imagery enables automatic assessment of SfM correctness and model completeness. Across pairwise disambiguation and SfM reconstruction tasks, Doppelgangers++ demonstrates improved precision, recall, and inlier-based geometry accuracy, especially in diverse, everyday scenes. The approach integrates seamlessly into existing SfM pipelines and reduces sensitivity to threshold tuning, offering practical gains for robust 3D modeling in real-world environments.

Abstract

Accurate 3D reconstruction is frequently hindered by visual aliasing, where visually similar but distinct surfaces (aka, doppelgangers), are incorrectly matched. These spurious matches distort the structure-from-motion (SfM) process, leading to misplaced model elements and reduced accuracy. Prior efforts addressed this with CNN classifiers trained on curated datasets, but these approaches struggle to generalize across diverse real-world scenes and can require extensive parameter tuning. In this work, we present Doppelgangers++, a method to enhance doppelganger detection and improve 3D reconstruction accuracy. Our contributions include a diversified training dataset that incorporates geo-tagged images from everyday scenes to expand robustness beyond landmark-based datasets. We further propose a Transformer-based classifier that leverages 3D-aware features from the MASt3R model, achieving superior precision and recall across both in-domain and out-of-domain tests. Doppelgangers++ integrates seamlessly into standard SfM and MASt3R-SfM pipelines, offering efficiency and adaptability across varied scenes. To evaluate SfM accuracy, we introduce an automated, geotag-based method for validating reconstructed models, eliminating the need for manual inspection. Through extensive experiments, we demonstrate that Doppelgangers++ significantly enhances pairwise visual disambiguation and improves 3D reconstruction quality in complex and diverse scenarios.

Paper Structure

This paper contains 14 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Visual aliasing, or doppelgangers, poses severe challenges to 3D reconstruction. We propose Doppelganger++, an enhanced pairwise image classifier that excels in visual disambiguation across diverse and challenging scenes. (Top) We seamlessly integrate Doppelganger++ into SfM, successfully disambiguating each scene. (Middle) Compared to prior work (which we refer to as DG-OG Cai2023DoppelgangersLT), Doppelgangers++ is more robust for everyday scenes, showing improved accuracy and robustness. We show pairs that DG-OG classifies incorrectly and ours gets correct. (Bottom) Our new VisymScenes dataset, featuring complex daily scenes, is particularly challenging for COLMAP and DG-OG, but our method can achieve correct and complete reconstructions.
  • Figure 2: VisymScenes examples. This new dataset includes residential areas, landmarks, historical sites, business districts, and more. Here, we present four example sites. The top row shows subsets of images captured within each site. The bottom row displays pairs of visually similar but geographically distinct images from each site along with their recorded geolocations on a map. These examples demonstrate that doppelganger issues are prevalent in everyday scenes, presenting significant challenges for reliable 3D reconstruction and image matching.
  • Figure 3: Model design.(Left) Given an image pair, we first create a symmetrized version of the pair and feed it into the frozen MASt3R model. Multi-layer features are extracted from each decoder branch, concatenated, and fed into two learnable doppelganger classification heads. Each head generates predictions $(\textrm{pred}_{pq}^v, \textrm{pred}_{qp}^v), v\in\{1,2\}$ (where $pq$ and $qp$ denote the symmetrized image pair), supervised by cross-entropy loss. (Right) We use multi-layer decoder features and a Transformer-based classifier head for doppelganger prediction.
  • Figure 4: Evaluation of doppelganger correction in SfM.(Top) We first collect sequences of geo-tagged Mapillary images around the target location and register them to the SfM model. Then, we use RANSAC to align the registered cameras and their geolocations. The inlier ratio is computed as an indicator of model accuracy. (Bottom) In the model corrupted by doppelganger pairs, the registered cameras all collapse to one side. We see that the camera poses estimated with COLMAP (right, in red) do not align well with the geotags (green), leading to a low inlier ratio, but our method leads to a much closer alignment.
  • Figure 5: SfM Disambiguation on MegaScenes Tung2024MegaScenesSV.(White background) SfM results from DG-OG Cai2023DoppelgangersLT and ours. (Black background) Verification using geo-tagged images, red points represent registered cameras and green points represent geolocations, inlier ratio (IR) is labeled on the bottom right. DG-OG fails to disambiguate this scene, predicting incorrect scores for image pairs. Our method correctly splits the model into two clean components.
  • ...and 3 more figures