Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

Qiang Chen; Jian Wang; Chuchu Han; Shan Zhang; Zexian Li; Xiaokang Chen; Jiahui Chen; Xiaodi Wang; Shuming Han; Gang Zhang; Haocheng Feng; Kun Yao; Junyu Han; Errui Ding; Jingdong Wang

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

TL;DR

To push object detection performance, the paper leverages encoder-decoder pretraining at scale by combining a ViT-Huge encoder pretrained on ImageNet-1K, Object365-based detector pretraining, and COCO finetuning within a Group DETR framework. The detector uses a DINO-based transformer decoder integrated with Group DETR for faster convergence and better accuracy. Empirically, it achieves $64.5$ mAP on COCO test-dev and strong results on Object365, surpassing previous SoTA. This work highlights the effectiveness of large-scale encoder pretraining and DETR-style decoding for high-performance object detection with scalable training.

Abstract

We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, a DETR variant DINO~\cite{zhang2022dino}, and an efficient DETR training method Group DETR~\cite{chen2022group}. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves $\textbf{64.5}$ mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

TL;DR

mAP on COCO test-dev and strong results on Object365, surpassing previous SoTA. This work highlights the effectiveness of large-scale encoder pretraining and DETR-style decoding for high-performance object detection with scalable training.

Abstract

mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco

Paper Structure (10 sections, 3 tables)

This paper contains 10 sections, 3 tables.

Introduction
Method
Architecture
Encoder.
Decoder.
Implementation
Experiments
Results on Object365 $\textbf{5}$k val.
Results on the COCO test-dev.
Comparisons with state-of-the-art results on the COCO test-dev leaderboard.

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

TL;DR

Abstract

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

Authors

TL;DR

Abstract

Table of Contents