Table of Contents
Fetching ...

PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection

Zhengjian Kang, Jun Zhuang, Kangtong Mo, Qi Chen, Rui Liu, Ye Zhang

TL;DR

PaQ-DETR (Pattern and Quality-Aware DETR), a unified framework that enhances both query adaptivity and supervision balance and provides interpretable insights into how dynamic patterns cluster semantically across object categories is proposed.

Abstract

Detection Transformer (DETR) has redefined object detection by casting it as a set prediction task within an end-to-end framework. Despite its elegance, DETR and its variants still rely on fixed learnable queries and suffer from severe query utilization imbalance, which limits adaptability and leaves the model capacity underused. We propose PaQ-DETR (Pattern and Quality-Aware DETR), a unified framework that enhances both query adaptivity and supervision balance. It learns a compact set of shared latent patterns capturing global semantics and dynamically generates image-specific queries through content-conditioned weighting. In parallel, a quality-aware one-to-many assignment strategy adaptively selects positive samples based on localizatio-classification consistency, enriching supervision and promoting balanced query optimization. Experiments on COCO, CityScapes, and other benchmarks show consistent gains of 1.5%-4.2% mAP across DETR backbones, including ResNet and Swin-Transformer. Beyond accuracy improvement, our method provides interpretable insights into how dynamic patterns cluster semantically across object categories.

PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection

TL;DR

PaQ-DETR (Pattern and Quality-Aware DETR), a unified framework that enhances both query adaptivity and supervision balance and provides interpretable insights into how dynamic patterns cluster semantically across object categories is proposed.

Abstract

Detection Transformer (DETR) has redefined object detection by casting it as a set prediction task within an end-to-end framework. Despite its elegance, DETR and its variants still rely on fixed learnable queries and suffer from severe query utilization imbalance, which limits adaptability and leaves the model capacity underused. We propose PaQ-DETR (Pattern and Quality-Aware DETR), a unified framework that enhances both query adaptivity and supervision balance. It learns a compact set of shared latent patterns capturing global semantics and dynamically generates image-specific queries through content-conditioned weighting. In parallel, a quality-aware one-to-many assignment strategy adaptively selects positive samples based on localizatio-classification consistency, enriching supervision and promoting balanced query optimization. Experiments on COCO, CityScapes, and other benchmarks show consistent gains of 1.5%-4.2% mAP across DETR backbones, including ResNet and Swin-Transformer. Beyond accuracy improvement, our method provides interpretable insights into how dynamic patterns cluster semantically across object categories.
Paper Structure (15 sections, 9 equations, 6 figures, 8 tables)

This paper contains 15 sections, 9 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Query activation distributions of baseline DETRs vs. PaQ-DETR. Standard DETRs show highly skewed, long-tailed activations, while PaQ-DETR reduces imbalance and lowers Gini coefficients.
  • Figure 2: Overview of PaQ-DETR. Our framework integrates (1) a content-aware weight generator that adapts query composition to image features, (2) a pattern-based representation module that learns shared semantic bases, and (3) a quality-aware one-to-many assignment that provides balanced supervision.
  • Figure 3: Ablation study on key hyperparameters of PaQ-DETR: (a) number of patterns, (b) diversity loss weight $\beta$, (c) top-$k$ in quality-aware assignment, and (d) balancing weight $\gamma$.
  • Figure 4: Convergence curves comparing baseline (dashed lines) with our methods (solid lines). The horizontal axis shows the number of the epochs, while the vertical axis shows mAP metrics.
  • Figure 5: Pattern activation visualization. Left: activation heat-map; Right: statistical distribution of weights over successful detection for (a) person and (b) cat categories.
  • ...and 1 more figures