Table of Contents
Fetching ...

DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D Reconstruction

Kaichen Zhang, Tianxiang Sheng, Xuanming Shi

TL;DR

The paper tackles robust image matching for large-scale 3D reconstruction from unstructured Internet images by introducing DINO-RotateMatch, a rotation-aware, self-supervised framework. It combines DINO-based image pairing with rotation-enhanced keypoint extraction (ALIKED) and rotation-aware matching (LightGlue), followed by COLMAP for 3D reconstruction. Key contributions include a dual-path image pairing strategy for small vs. large datasets and an explicit rotation augmentation pipeline that increases correspondences and robustness to viewpoint changes, demonstrated on the Kaggle Image Matching Challenge 2025 with a Silver Award. The findings show substantial improvements in mean Average Accuracy over strong baselines, highlighting the method’s scalability and robustness for real-world large-scale reconstructions from diverse image collections.

Abstract

This paper presents DINO-RotateMatch, a deep-learning framework designed to address the chal lenges of image matching in large-scale 3D reconstruction from unstructured Internet images. The method integrates a dataset-adaptive image pairing strategy with rotation-aware keypoint extraction and matching. DINO is employed to retrieve semantically relevant image pairs in large collections, while rotation-based augmentation captures orientation-dependent local features using ALIKED and Light Glue. Experiments on the Kaggle Image Matching Challenge 2025 demonstrate consistent improve ments in mean Average Accuracy (mAA), achieving a Silver Award (47th of 943 teams). The results confirm that combining self-supervised global descriptors with rotation-enhanced local matching offers a robust and scalable solution for large-scale 3D reconstruction.

DINO-RotateMatch: A Rotation-Aware Deep Framework for Robust Image Matching in Large-Scale 3D Reconstruction

TL;DR

The paper tackles robust image matching for large-scale 3D reconstruction from unstructured Internet images by introducing DINO-RotateMatch, a rotation-aware, self-supervised framework. It combines DINO-based image pairing with rotation-enhanced keypoint extraction (ALIKED) and rotation-aware matching (LightGlue), followed by COLMAP for 3D reconstruction. Key contributions include a dual-path image pairing strategy for small vs. large datasets and an explicit rotation augmentation pipeline that increases correspondences and robustness to viewpoint changes, demonstrated on the Kaggle Image Matching Challenge 2025 with a Silver Award. The findings show substantial improvements in mean Average Accuracy over strong baselines, highlighting the method’s scalability and robustness for real-world large-scale reconstructions from diverse image collections.

Abstract

This paper presents DINO-RotateMatch, a deep-learning framework designed to address the chal lenges of image matching in large-scale 3D reconstruction from unstructured Internet images. The method integrates a dataset-adaptive image pairing strategy with rotation-aware keypoint extraction and matching. DINO is employed to retrieve semantically relevant image pairs in large collections, while rotation-based augmentation captures orientation-dependent local features using ALIKED and Light Glue. Experiments on the Kaggle Image Matching Challenge 2025 demonstrate consistent improve ments in mean Average Accuracy (mAA), achieving a Silver Award (47th of 943 teams). The results confirm that combining self-supervised global descriptors with rotation-enhanced local matching offers a robust and scalable solution for large-scale 3D reconstruction.

Paper Structure

This paper contains 7 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Feature Visualization for Image Analysis. Plots and overlay showing feature visualization results for the entity "another_et_another_et001". The Patch Activation Map (left) and Attention Map (middle) depict spatial feature activation and attention distribution, respectively, with the overlay on the right showing feature weights mapped onto the original image.
  • Figure 2: Rotation Detection for Image Matching. Example of rotation detection in image matching. Images of a structure at $0^\circ, 90^\circ, 180^\circ, 270^\circ$ rotations are matched, with only the 90° rotation successfully passing the matching criteria (marked with a check), while other rotations fail (marked with crosses).
  • Figure 3: Overview of the DINO-RotateMatch 3D reconstruction pipeline.. Flowchart depicting the multi - stage pipeline for 3D reconstruction, encompassing image processing (image pairing, keypoint extraction, image matching) and 3D reconstruction stages. Image pairing adapts between exhaustive search and DINO - based methods based on dataset size; ALIKED extracts keypoints from four directions after rotation; LightGlue performs two - stage matching with a threshold of $\ge 25$ matched keypoints; and COLMAP generates dense 3D reconstruction, with performance scored by mean Average Accuracy (mAA) on test datasets.
  • Figure 4: Pipeline of Image Feature Processing and Candidate Pair Generation. Diagram illustrating the four - stage process for image feature handling and candidate pair creation. It starts with global feature extraction via DINO, followed by Euclidean distance calculation, then candidate screening with distance thresholding to ensure minimum pairs, and finally deduplication and similarity sorting of candidate pairs.
  • Figure 5: Image Matching and 3D Reconstruction Decision Flow. Visualization of the image matching procedure and 3D reconstruction criteria. Image pairs (Image A and Image B, shown before and after rotation) undergo two-stage matching with LightGlue. Total matched keypoints are summed across orientations in the first match. The second match fixes Image B, and if the combined matched keypoints from both stages are $\ge 25$, 3D reconstruction proceeds; otherwise, the pair is dropped.