Table of Contents
Fetching ...

One Million Scenes for Autonomous Driving: ONCE Dataset

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu

TL;DR

The ONCE dataset addresses data scarcity for 3D detection in autonomous driving by offering 1 million labeled/unlabeled LiDAR scenes and 7 million camera images across diverse conditions. The authors establish a unified benchmark to evaluate self-, semi-, and unsupervised learning, as well as unsupervised domain adaptation, using orientation-aware metrics and multiple detectors. Key findings show that pretraining on ONCE yields stronger downstream performance, semi-supervised and clustering-based SSL methods scale well with data, and domain adaptation remains challenging yet promising. Overall, ONCE provides a scalable data resource and evaluation framework that can drive future advances in robust, large-scale 3D perception for autonomous vehicles.

Abstract

Current perception models in autonomous driving have become notorious for greatly relying on a mass of annotated data to cover unseen cases and address the long-tail problem. On the other hand, learning from unlabeled large-scale collected data and incrementally self-training powerful recognition models have received increasing attention and may become the solutions of next-generation industry-level powerful and robust perception models in autonomous driving. However, the research community generally suffered from data inadequacy of those essential real-world scene data, which hampers the future exploration of fully/semi/self-supervised methods for 3D perception. In this paper, we introduce the ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available (e.g. nuScenes and Waymo), and it is collected across a range of different areas, periods and weather conditions. To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset. We conduct extensive analyses on those methods and provide valuable observations on their performance related to the scale of used data. Data, code, and more information are available at https://once-for-auto-driving.github.io/index.html.

One Million Scenes for Autonomous Driving: ONCE Dataset

TL;DR

The ONCE dataset addresses data scarcity for 3D detection in autonomous driving by offering 1 million labeled/unlabeled LiDAR scenes and 7 million camera images across diverse conditions. The authors establish a unified benchmark to evaluate self-, semi-, and unsupervised learning, as well as unsupervised domain adaptation, using orientation-aware metrics and multiple detectors. Key findings show that pretraining on ONCE yields stronger downstream performance, semi-supervised and clustering-based SSL methods scale well with data, and domain adaptation remains challenging yet promising. Overall, ONCE provides a scalable data resource and evaluation framework that can drive future advances in robust, large-scale 3D perception for autonomous vehicles.

Abstract

Current perception models in autonomous driving have become notorious for greatly relying on a mass of annotated data to cover unseen cases and address the long-tail problem. On the other hand, learning from unlabeled large-scale collected data and incrementally self-training powerful recognition models have received increasing attention and may become the solutions of next-generation industry-level powerful and robust perception models in autonomous driving. However, the research community generally suffered from data inadequacy of those essential real-world scene data, which hampers the future exploration of fully/semi/self-supervised methods for 3D perception. In this paper, we introduce the ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available (e.g. nuScenes and Waymo), and it is collected across a range of different areas, periods and weather conditions. To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset. We conduct extensive analyses on those methods and provide valuable observations on their performance related to the scale of used data. Data, code, and more information are available at https://once-for-auto-driving.github.io/index.html.

Paper Structure

This paper contains 24 sections, 1 equation, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Images and point clouds sampled from the ONCE (One millioN sCenEs) dataset. Our ONCE dataset covers a variety of geographical locations, time periods and weather conditions.
  • Figure 2: Sensor locations and coordinate systems. The data acquisition vehicle is equipped with $1$ LiDAR and $7$ cameras that can capture 3D point clouds and images from $360^{\circ}$ field of view.
  • Figure 3: An overview of our 3D object detection benchmark. We reproduce $6$ detection models, $4$ self-supervised learning, $5$ semi-supervised learning, and $2$ unsupervised domain adaptation methods for 3D object detection. We give comprehensive analyses on the results and offer valuable observations.
  • Figure 4: Proportions of different weather, time and areas in the ONCE dataset. Our dataset covers a wide range of domains with $6\%$ scenes captured on rainy days and $20\%$ scenes collected at night.
  • Figure 5: Distribution of annotation counts per scene. Our ONCE dataset is diverse in the number of objects in each scene. The vehicle count in each scene ranges from $0$ to $60$.
  • ...and 4 more figures