Fractional Correspondence Framework in Detection Transformer
Masoumeh Zareapoor, Pourya Shamsolmoali, Huiyu Zhou, Yue Lu, Salvador García
TL;DR
This work addresses the rigid one-to-one matching in DETR by introducing Regularized Transport Plan (RTP), which leverages entropy-regularized optimal transport via Sinkhorn to produce soft, fractional matches between predictions and ground truths. By relaxing marginal constraints with KL terms and entropy, RTP captures object density and distribution more faithfully than the Hungarian approach, improving convergence and detection, especially for small or densely packed objects. Empirical results on COCO and VOC show RTP-DETR outperforming several DETR variants and synergizing with IoU-aware losses, with notable gains across AP metrics and faster training. The method offers a practical, scalable alternative for end-to-end object detection and points to future extensions toward zero-shot detection using transferable transport plans.
Abstract
The Detection Transformer (DETR), by incorporating the Hungarian algorithm, has significantly simplified the matching process in object detection tasks. This algorithm facilitates optimal one-to-one matching of predicted bounding boxes to ground-truth annotations during training. While effective, this strict matching process does not inherently account for the varying densities and distributions of objects, leading to suboptimal correspondences such as failing to handle multiple detections of the same object or missing small objects. To address this, we propose the Regularized Transport Plan (RTP). RTP introduces a flexible matching strategy that captures the cost of aligning predictions with ground truths to find the most accurate correspondences between these sets. By utilizing the differentiable Sinkhorn algorithm, RTP allows for soft, fractional matching rather than strict one-to-one assignments. This approach enhances the model's capability to manage varying object densities and distributions effectively. Our extensive evaluations on the MS-COCO and VOC benchmarks demonstrate the effectiveness of our approach. RTP-DETR, surpassing the performance of the Deform-DETR and the recently introduced DINO-DETR, achieving absolute gains in mAP of +3.8% and +1.7%, respectively.
