Table of Contents
Fetching ...

MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection

Yuxue Yang, Lue Fan, Zhaoxiang Zhang

TL;DR

MixSup tackles label efficiency in LiDAR-based 3D detection by combining abundant coarse cluster-level semantic labels with a limited number of accurate box-level labels to jointly learn semantics and geometry. It redesigns label assignment to be detector-friendly, enabling easy integration with mainstream detectors, and introduces PointSAM to automate coarse labeling via SAM, further reducing annotation burden. Across nuScenes, Waymo, and KITTI, MixSup attains up to approximately 97% of fully supervised performance using only 10% box annotations plus cheap cluster labels, demonstrating strong practical efficiency. The approach is compatible with simple self-training and can be extended with auto-labelers, offering a scalable, versatile path toward cost-effective LiDAR perception without substantial accuracy loss.

Abstract

Label-efficient LiDAR-based 3D object detection is currently dominated by weakly/semi-supervised methods. Instead of exclusively following one of them, we propose MixSup, a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision. We start by observing that point clouds are usually textureless, making it hard to learn semantics. However, point clouds are geometrically rich and scale-invariant to the distances from sensors, making it relatively easy to learn the geometry of objects, such as poses and shapes. Thus, MixSup leverages massive coarse cluster-level labels to learn semantics and a few expensive box-level labels to learn accurate poses and shapes. We redesign the label assignment in mainstream detectors, which allows them seamlessly integrated into MixSup, enabling practicality and universality. We validate its effectiveness in nuScenes, Waymo Open Dataset, and KITTI, employing various detectors. MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations. Furthermore, we propose PointSAM based on the Segment Anything Model for automated coarse labeling, further reducing the annotation burden. The code is available at https://github.com/BraveGroup/PointSAM-for-MixSup.

MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection

TL;DR

MixSup tackles label efficiency in LiDAR-based 3D detection by combining abundant coarse cluster-level semantic labels with a limited number of accurate box-level labels to jointly learn semantics and geometry. It redesigns label assignment to be detector-friendly, enabling easy integration with mainstream detectors, and introduces PointSAM to automate coarse labeling via SAM, further reducing annotation burden. Across nuScenes, Waymo, and KITTI, MixSup attains up to approximately 97% of fully supervised performance using only 10% box annotations plus cheap cluster labels, demonstrating strong practical efficiency. The approach is compatible with simple self-training and can be extended with auto-labelers, offering a scalable, versatile path toward cost-effective LiDAR perception without substantial accuracy loss.

Abstract

Label-efficient LiDAR-based 3D object detection is currently dominated by weakly/semi-supervised methods. Instead of exclusively following one of them, we propose MixSup, a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision. We start by observing that point clouds are usually textureless, making it hard to learn semantics. However, point clouds are geometrically rich and scale-invariant to the distances from sensors, making it relatively easy to learn the geometry of objects, such as poses and shapes. Thus, MixSup leverages massive coarse cluster-level labels to learn semantics and a few expensive box-level labels to learn accurate poses and shapes. We redesign the label assignment in mainstream detectors, which allows them seamlessly integrated into MixSup, enabling practicality and universality. We validate its effectiveness in nuScenes, Waymo Open Dataset, and KITTI, employing various detectors. MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations. Furthermore, we propose PointSAM based on the Segment Anything Model for automated coarse labeling, further reducing the annotation burden. The code is available at https://github.com/BraveGroup/PointSAM-for-MixSup.
Paper Structure (49 sections, 3 equations, 9 figures, 15 tables)

This paper contains 49 sections, 3 equations, 9 figures, 15 tables.

Figures (9)

  • Figure 1: Illustration of distinct properties of point clouds compared to images. They make semantic learning from points difficult but ease the estimation of geometry, which is the initial motivation of MixSup.
  • Figure 2: Illustration of the pilot study. We develop a well-classified dataset to factor out the classification and only focus on the influence of varying data amounts on geometry estimation.
  • Figure 3: Overview of MixSup. The massive cluster-level labels serve for semantic learning and a few box labels are used to learn geometry attributes. We redesign the label assignment to integrate various detectors into MixSup.
  • Figure 4: Illustration of Box-cluster IoU.
  • Figure 5: Overall of PointSAM.
  • ...and 4 more figures