Table of Contents
Fetching ...

Class-Distribution Guided Active Learning for 3D Occupancy Prediction in Autonomous Driving

Wonjune Kim, In-Jae Lee, Sihwan Hwang, Sanmin Kim, Dongsuk Kum

Abstract

3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant classes is inefficient. To address these challenges, we propose a class-distribution guided active learning framework for selecting training samples to annotate in autonomous driving datasets. Our approach combines three complementary criteria to select the training samples. Inter-sample diversity prioritizes samples whose predicted class distributions differ from those of the labeled set, intra-set diversity prevents redundant sampling within each acquisition cycle, and frequency-weighted uncertainty emphasizes rare classes by reweighting voxel-level entropy with inverse per-sample class proportions. We ensure evaluation validity by using a geographically disjoint train/validation split of Occ3D-nuScenes, which reduces train-validation overlap and mitigates potential map memorization. With only 42.4% labeled data, our framework reaches 26.62 mIoU, comparable to full supervision and outperforming active learning baselines at the same budget. We further validate generality on SemanticKITTI using a different architecture, demonstrating consistent effectiveness across datasets.

Class-Distribution Guided Active Learning for 3D Occupancy Prediction in Autonomous Driving

Abstract

3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant classes is inefficient. To address these challenges, we propose a class-distribution guided active learning framework for selecting training samples to annotate in autonomous driving datasets. Our approach combines three complementary criteria to select the training samples. Inter-sample diversity prioritizes samples whose predicted class distributions differ from those of the labeled set, intra-set diversity prevents redundant sampling within each acquisition cycle, and frequency-weighted uncertainty emphasizes rare classes by reweighting voxel-level entropy with inverse per-sample class proportions. We ensure evaluation validity by using a geographically disjoint train/validation split of Occ3D-nuScenes, which reduces train-validation overlap and mitigates potential map memorization. With only 42.4% labeled data, our framework reaches 26.62 mIoU, comparable to full supervision and outperforming active learning baselines at the same budget. We further validate generality on SemanticKITTI using a different architecture, demonstrating consistent effectiveness across datasets.

Paper Structure

This paper contains 17 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the proposed class-distribution guided acquisition strategy based on predictive uncertainty and class-distribution diversity. Unlabeled samples receive an uncertainty score and a diversity score computed from similarity to the current selection and to the labeled set. The method selects samples that are uncertain and dissimilar to both sets and rejects redundant or certain samples. Sample 1 is selected because uncertainty is high and similarity to both sets is low. Sample 2 is rejected because it is similar to the selected Sample 1, which lowers intra-set diversity. Sample 3 is rejected because it is similar to the labeled reference Sample 4 and its uncertainty is low. Sample 4 is the labeled anchor used for similarity assessment. The bars show predicted class distributions.
  • Figure 2: Overview of our active learning method for 3D occupancy prediction. An initial labeled set trains the model, which then performs inference on the unlabeled pool. In each iteration, candidate samples are scored using three complementary metrics: frequency-weighted uncertainty, inter-sample diversity relative to the labeled set, and intra-set diversity within the current selected samples. Scores are normalized and combined into a unified acquisition score. Top-ranked samples are selected for annotation and added to the labeled subset. The process repeats for $T$ cycles, each time updating the labeled set and retraining the model.
  • Figure 3: Qualitative comparison of active learning strategies for 3D occupancy prediction. Each example shows six surround-view images and the ground truth occupancy grid, followed by model outputs when samples are chosen by random, entropy, Coresetcoreset, and our active learning methods. Red dashed boxes mark regions where our approach more accurately predicts occupancy than the baselines. This improvement comes from our sample selection strategy, which emphasizes rare-class uncertainty and diversity in sample-level class distributions.
  • Figure 4: Frequency-weighted uncertainty visualization. Two representative training samples are shown. Top: A high-uncertainty sample with diverse classes, including rare objects. Bottom: A low-uncertainty sample dominated by common classes. Frequency weighting highlights rare-class regions and suppresses dominant backgrounds. Dashed boxes indicate zoomed-in regions.