Table of Contents
Fetching ...

OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection

Zhongyu Xia, Jishuo Li, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Ming-Hsuan Yang

TL;DR

OpenAD tackles the challenge of open-world perception in autonomous driving by introducing the first real-world benchmark for 3D object detection that jointly evaluates domain generalization and open-ended understanding. It presents a corner-case discovery and annotation pipeline that leverages multimodal large language models to annotate corner-case objects across five datasets, creating 2,000 scenes and 19,761 total objects spanning 206 categories. The authors propose a vision-centric 3D open-ended detection baseline that converts 2D proposals into 3D boxes and a fusion approach that combines open-world and specialized models to balance precision and generalization. Experimental results show open-world models excel in generalization but lag in in-domain accuracy, while the proposed ensemble and vision-centric baselines achieve strong performance on OpenAD, highlighting practical benefits for robust, open-world autonomous driving perception.

Abstract

Open-world perception aims to develop a model adaptable to novel domains and various sensor configurations and can understand uncommon objects and corner cases. However, current research lacks sufficiently comprehensive open-world 3D perception benchmarks and robust generalizable methodologies. This paper introduces OpenAD, the first real open-world autonomous driving benchmark for 3D object detection. OpenAD is built upon a corner case discovery and annotation pipeline that integrates with a multimodal large language model (MLLM). The proposed pipeline annotates corner case objects in a unified format for five autonomous driving perception datasets with 2000 scenarios. In addition, we devise evaluation methodologies and evaluate various open-world and specialized 2D and 3D models. Moreover, we propose a vision-centric 3D open-world object detection baseline and further introduce an ensemble method by fusing general and specialized models to address the issue of lower precision in existing open-world methods for the OpenAD benchmark. We host an online challenge on EvalAI. Data, toolkit codes, and evaluation codes are available at https://github.com/VDIGPKU/OpenAD.

OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection

TL;DR

OpenAD tackles the challenge of open-world perception in autonomous driving by introducing the first real-world benchmark for 3D object detection that jointly evaluates domain generalization and open-ended understanding. It presents a corner-case discovery and annotation pipeline that leverages multimodal large language models to annotate corner-case objects across five datasets, creating 2,000 scenes and 19,761 total objects spanning 206 categories. The authors propose a vision-centric 3D open-ended detection baseline that converts 2D proposals into 3D boxes and a fusion approach that combines open-world and specialized models to balance precision and generalization. Experimental results show open-world models excel in generalization but lag in in-domain accuracy, while the proposed ensemble and vision-centric baselines achieve strong performance on OpenAD, highlighting practical benefits for robust, open-world autonomous driving perception.

Abstract

Open-world perception aims to develop a model adaptable to novel domains and various sensor configurations and can understand uncommon objects and corner cases. However, current research lacks sufficiently comprehensive open-world 3D perception benchmarks and robust generalizable methodologies. This paper introduces OpenAD, the first real open-world autonomous driving benchmark for 3D object detection. OpenAD is built upon a corner case discovery and annotation pipeline that integrates with a multimodal large language model (MLLM). The proposed pipeline annotates corner case objects in a unified format for five autonomous driving perception datasets with 2000 scenarios. In addition, we devise evaluation methodologies and evaluate various open-world and specialized 2D and 3D models. Moreover, we propose a vision-centric 3D open-world object detection baseline and further introduce an ensemble method by fusing general and specialized models to address the issue of lower precision in existing open-world methods for the OpenAD benchmark. We host an online challenge on EvalAI. Data, toolkit codes, and evaluation codes are available at https://github.com/VDIGPKU/OpenAD.

Paper Structure

This paper contains 18 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Examples of corner case objects in OpenAD. These object categories have not been encountered by models trained on common 3D perception datasets during their training phase.
  • Figure 2: Data composition of OpenAD. OpenAD covers multiple cities in various countries, including scenes during the day and night, on different weather and road scenarios. Additionally, we annotate each object with an indication of whether its category is observed in the training set of each dataset, allowing for separate evaluations of the model's specialized performance and open-ended performance.
  • Figure 3: Annotation pipeline. OpenAD is built upon a corner case discovery and annotation pipeline that integrates with a multimodal large language model (MLLM).
  • Figure 4: The 3D open-world object detection framework we proposed. After obtaining 2D proposals from any frozen open-world 2D object detection model, we train a 2D-to-3D BBox Converter to predict 3D bounding boxes. The converter has a dual-branch architecture, which extracts pseudo-point features and convolutional features. It is lightweight and easy to train.
  • Figure 5: Example results of open-world models, specialized models, and our proposed ensemble method.
  • ...and 2 more figures