Table of Contents
Fetching ...

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Jiaqi Huang, Zunnan Xu, Xiu Li, Kehong Yuan, Yanyan Zu, Jiayao Ha, Qiong Gao, Licheng Jiao

TL;DR

The paper presents the V3Det Challenge 2024, introducing a 13,204-category dataset to study vast and open vocabulary object detection under COCO-like evaluation. It analyzes two tracks—vast vocabulary and open vocabulary—covering metric definitions, baseline results, and winning approaches that blend semi-supervised learning, advanced detectors, and vision-language models. The top solutions demonstrate substantial gains via semi-supervised pseudo-labeling, deformable DETR-based backbones, and CLIP/Long-CLIP–style open-vocabulary classifiers, while open vocabulary results reveal meaningful progress yet indicate room for improvement, particularly in novel-object recognition. Overall, the work provides benchmarks, platform infrastructure, and actionable insights to guide future research in scalable, open-world object detection with large vocabularies.

Abstract

Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

TL;DR

The paper presents the V3Det Challenge 2024, introducing a 13,204-category dataset to study vast and open vocabulary object detection under COCO-like evaluation. It analyzes two tracks—vast vocabulary and open vocabulary—covering metric definitions, baseline results, and winning approaches that blend semi-supervised learning, advanced detectors, and vision-language models. The top solutions demonstrate substantial gains via semi-supervised pseudo-labeling, deformable DETR-based backbones, and CLIP/Long-CLIP–style open-vocabulary classifiers, while open vocabulary results reveal meaningful progress yet indicate room for improvement, particularly in novel-object recognition. Overall, the work provides benchmarks, platform infrastructure, and actionable insights to guide future research in scalable, open-world object detection with large vocabularies.

Abstract

Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge
Paper Structure (26 sections, 1 equation, 7 tables)