Table of Contents
Fetching ...

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector

Johan Edstedt, Georg Bökman, Zhenjun Zhao

TL;DR

This work analyzes the DeDoDe keypoint detector, identifying clustering, rotation sensitivity, and misalignment with downstream pose evaluation as key issues. It introduces DeDoDe v2 with training-time non-max suppression, expanded data augmentation, and a drastically shortened training schedule, evaluated using RoMa to assess downstream usability. The approach achieves state-of-the-art two-view pose estimation on MegaDepth-1500 and IMC2022 benchmarks, notably boosting IMC2022 mAA from $75.9$ to $78.3$, while reducing training to about $20$ minutes on a single $A100$ GPU. Overall, the paper demonstrates that targeted training-time modifications can substantially improve a descriptor-agnostic keypoint detector within a detect-don’t-describe framework.

Abstract

In this paper, we analyze and improve into the recently proposed DeDoDe keypoint detector. We focus our analysis on some key issues. First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training. Second, we address issues related to data augmentation. In particular, the DeDoDe detector is sensitive to large rotations. We fix this by including 90-degree rotations as well as horizontal flips. Finally, the decoupled nature of the DeDoDe detector makes evaluation of downstream usefulness problematic. We fix this by matching the keypoints with a pretrained dense matcher (RoMa) and evaluating two-view pose estimates. We find that the original long training is detrimental to performance, and therefore propose a much shorter training schedule. We integrate all these improvements into our proposed detector DeDoDe v2 and evaluate it with the original DeDoDe descriptor on the MegaDepth-1500 and IMC2022 benchmarks. Our proposed detector significantly increases pose estimation results, notably from 75.9 to 78.3 mAA on the IMC2022 challenge. Code and weights are available at https://github.com/Parskatt/DeDoDe

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector

TL;DR

This work analyzes the DeDoDe keypoint detector, identifying clustering, rotation sensitivity, and misalignment with downstream pose evaluation as key issues. It introduces DeDoDe v2 with training-time non-max suppression, expanded data augmentation, and a drastically shortened training schedule, evaluated using RoMa to assess downstream usability. The approach achieves state-of-the-art two-view pose estimation on MegaDepth-1500 and IMC2022 benchmarks, notably boosting IMC2022 mAA from to , while reducing training to about minutes on a single GPU. Overall, the paper demonstrates that targeted training-time modifications can substantially improve a descriptor-agnostic keypoint detector within a detect-don’t-describe framework.

Abstract

In this paper, we analyze and improve into the recently proposed DeDoDe keypoint detector. We focus our analysis on some key issues. First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training. Second, we address issues related to data augmentation. In particular, the DeDoDe detector is sensitive to large rotations. We fix this by including 90-degree rotations as well as horizontal flips. Finally, the decoupled nature of the DeDoDe detector makes evaluation of downstream usefulness problematic. We fix this by matching the keypoints with a pretrained dense matcher (RoMa) and evaluating two-view pose estimates. We find that the original long training is detrimental to performance, and therefore propose a much shorter training schedule. We integrate all these improvements into our proposed detector DeDoDe v2 and evaluate it with the original DeDoDe descriptor on the MegaDepth-1500 and IMC2022 benchmarks. Our proposed detector significantly increases pose estimation results, notably from 75.9 to 78.3 mAA on the IMC2022 challenge. Code and weights are available at https://github.com/Parskatt/DeDoDe
Paper Structure (12 sections, 1 equation, 6 figures, 3 tables)

This paper contains 12 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: DeDoDe (left) vs DeDode v2 (right). We propose DeDoDe v2, an improved keypoint detector following the detect don't describe approach, whereby the detector is descriptor agnostic. We improve the DeDoDe detector, as demonstrated in the figure. DeDoDe struggles with clustering, whereby keypoints are overly detected in distinct regions. This, in turn, causes it to underdetect in other regions, causing performance to degrade. In contrast, our proposed detector produces diverse but repeatable keypoints for the entire scene.
  • Figure 2: Clusters in DeDoDe Detections. The DeDoDe detection objective does not explicitly enforce sparsity in the detections. This has the side-effect of the network producing so-called clusters of detections in particularly salient areas of the image. This is problematic in downstream tasks, as it means that many keypoints must be sampled to ensure repeatability.
  • Figure 3: Overfit to repeatability objective. We qualitatively illustrate the tension between repeatability and downstream relative pose objectives. We found that during the course of training, while the keypoints tended to become more distinct and repeatable, this resulted in less distinct regions getting almost no keypoints, in particular outside regions with COLMAP MVS, resulting in worse relative pose estimates.
  • Figure 4: Sensitivity to large rotations. The original DeDoDe detector (left) is sensitive to large in-plane rotations. This was first noted by bökman2024steerers. We extend their ideas and additionally include joint horizontal flips. DeDoDe v2 produces more consistent keypoints under rotation of the input image (right). We plot the top 5000.0 keypoints in all images.
  • Figure 5: Qualitative comparison of DISK tyszkiewicz2020disk (left), DeDoDe edstedt2024dedode (middle), DeDoDe v2 (right). Best viewed in high resolution. DISK (left) produces diverse, but non-discriminative keypoints. DeDoDe, in contrast, produces discriminative kepoints, but tends to cluster. Our proposed DeDoDe v2 has the benefit of both approaches, yielding both diverse and discriminative keypoints.
  • ...and 1 more figures