Table of Contents
Fetching ...

Source-Free Object Detection with Detection Transformer

Huizai Yao, Sicheng Zhao, Shuo Lu, Hui Chen, Yangyang Li, Guoping Liu, Tengfei Xing, Chenggang Yan, Jianhua Tao, Guiguang Ding

TL;DR

FRANCK presents a DETR-tailored source-free domain adaptation framework for object detection, integrating four components—Objectness Score-based Sample Reweighting (OSSR), Contrastive Learning with Matching-based Memory Bank (CMMB), Uncertainty-weighted Query-fused Feature Distillation (UQFD), and Dynamic Teacher Updating Interval (DTUI)—to achieve robust cross-domain transfer without access to source data. The approach targets category-, instance-, and feature-level alignment through a unified, query-centric design that leverages pseudo bipartite matching and memory banks to enable class-wise contrastive learning and reliable pseudo supervision. Extensive experiments across Cityscapes/Foggy Cityscapes, Sim10k→Cityscapes, and cross-dataset/scene/weather settings demonstrate state-of-the-art performance for DETR-based SFOD, highlighting improved discriminability, stability, and generalization. The work advances practical SFOD for DETR by exploiting DETR-specific structures and dynamic self-training, with potential extensions to multi-source domains and integration with vision foundation models.

Abstract

Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pre-trained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.

Source-Free Object Detection with Detection Transformer

TL;DR

FRANCK presents a DETR-tailored source-free domain adaptation framework for object detection, integrating four components—Objectness Score-based Sample Reweighting (OSSR), Contrastive Learning with Matching-based Memory Bank (CMMB), Uncertainty-weighted Query-fused Feature Distillation (UQFD), and Dynamic Teacher Updating Interval (DTUI)—to achieve robust cross-domain transfer without access to source data. The approach targets category-, instance-, and feature-level alignment through a unified, query-centric design that leverages pseudo bipartite matching and memory banks to enable class-wise contrastive learning and reliable pseudo supervision. Extensive experiments across Cityscapes/Foggy Cityscapes, Sim10k→Cityscapes, and cross-dataset/scene/weather settings demonstrate state-of-the-art performance for DETR-based SFOD, highlighting improved discriminability, stability, and generalization. The work advances practical SFOD for DETR by exploiting DETR-specific structures and dynamic self-training, with potential extensions to multi-source domains and integration with vision foundation models.

Abstract

Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pre-trained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.

Paper Structure

This paper contains 22 sections, 12 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Illustration of SFOD setting. Left: Conventional Domain Adaptive Object Detection (DAOD) approaches utilize both labeled source Domain ($D_S$) and unlabeled target domain ($D_T$) to transfer the detector to the target domain. Right: Source-Free Object Detection (SFOD) adapts source pre-trained model to the target domain when source data is unavailable.
  • Figure 2: A conceptual framework illustrating how FRANCK addresses DETR’s source‑free challenges through a unified query‑centric design. The three challenges are organized as category‑level alignment (inter‑class confusion), instance‑level alignment (class imbalance and inadequate supervision), and feature‑level alignment (unstable feature alignment). Each module (CMMB, OSSR, UQFD) targets one of these challenges, while their shared reliance on query representations forms a synergistic loop where improved features, better weighting, and reliable distillation reinforce each other for robust and efficient adaptation.
  • Figure 3: The proposed Feature Reweighting ANd Contrastive Learning NetworK (FRANCK). Source data is only available at the source pretraining stage. Within FRANCK, the teacher and student models collaborate to optimize the student network through UQFD, OSSR, and CMMB. The teacher network is updated dynamically using DTUI, ensuring more stable and effective adaptation. "Reg" and "Cls" refer to the regression and classification heads of DETR, respectively. "BBox" and "C.Pred." denote the bounding box predictions and classification predictions, respectively, while "PL" denotes pseudo-labels.
  • Figure 4: Experimental results from the ablation studies: (a) Influence of different features on object estimation. (b) Influence of various encoder feature fusion layers. (c) Comparisons between thresholding and matching in CMMB. (d) Influence of controlled noise levels in CMMB. (e) Influence of memory bank (MB) size. (f) Influence of memory bank (MB) composition, including first-in-first-out (FIFO), random replacement (RR), and center-guided replacement (CGR).
  • Figure 5: Hyperparameter Sensitivity Analysis. We illustrate the sensitivity of four key hyperparameters: $\omega_1$, $\omega_2$, $\beta$, $\beta’$, $\epsilon$, and $c_{\mathrm{thresh}}$ (Confidence Threshold). Each plot shows performance variation when adjusting a single hyperparameter while keeping the others fixed. The definitions of these hyperparameters are provided in Table \ref{['tab:hyperparam']}.
  • ...and 2 more figures