Table of Contents
Fetching ...

Text3DAug -- Prompted Instance Augmentation for LiDAR Perception

Laurenz Reichardt, Luca Uhr, Oliver Wasenmüller

TL;DR

Text3DAug tackles LiDAR data heterogeneity and class imbalance by introducing a fully automated, label-free augmentation pipeline that generates and places text-informed 3D meshes into LiDAR scenes. It uses a fixed prompting recipe to create meshes, evaluates them with CLIP, and renders realistic placements with remission-aware shading and ray casting, independent of the original dataset labels. Across SemanticKITTI, KITTI, and NuScenes, Text3DAug improves segmentation and detection performance, can outperform or match GT-based augmentation in many scenarios, and enables novel class discovery without labels. The approach is sensor-agnostic, scalable, and modular, with public code and potential extensions to other sensors and generative models.

Abstract

LiDAR data of urban scenarios poses unique challenges, such as heterogeneous characteristics and inherent class imbalance. Therefore, large-scale datasets are necessary to apply deep learning methods. Instance augmentation has emerged as an efficient method to increase dataset diversity. However, current methods require the time-consuming curation of 3D models or costly manual data annotation. To overcome these limitations, we propose Text3DAug, a novel approach leveraging generative models for instance augmentation. Text3DAug does not depend on labeled data and is the first of its kind to generate instances and annotations from text. This allows for a fully automated pipeline, eliminating the need for manual effort in practical applications. Additionally, Text3DAug is sensor agnostic and can be applied regardless of the LiDAR sensor used. Comprehensive experimental analysis on LiDAR segmentation, detection and novel class discovery demonstrates that Text3DAug is effective in supplementing existing methods or as a standalone method, performing on par or better than established methods, however while overcoming their specific drawbacks. The code is publicly available.

Text3DAug -- Prompted Instance Augmentation for LiDAR Perception

TL;DR

Text3DAug tackles LiDAR data heterogeneity and class imbalance by introducing a fully automated, label-free augmentation pipeline that generates and places text-informed 3D meshes into LiDAR scenes. It uses a fixed prompting recipe to create meshes, evaluates them with CLIP, and renders realistic placements with remission-aware shading and ray casting, independent of the original dataset labels. Across SemanticKITTI, KITTI, and NuScenes, Text3DAug improves segmentation and detection performance, can outperform or match GT-based augmentation in many scenarios, and enables novel class discovery without labels. The approach is sensor-agnostic, scalable, and modular, with public code and potential extensions to other sensors and generative models.

Abstract

LiDAR data of urban scenarios poses unique challenges, such as heterogeneous characteristics and inherent class imbalance. Therefore, large-scale datasets are necessary to apply deep learning methods. Instance augmentation has emerged as an efficient method to increase dataset diversity. However, current methods require the time-consuming curation of 3D models or costly manual data annotation. To overcome these limitations, we propose Text3DAug, a novel approach leveraging generative models for instance augmentation. Text3DAug does not depend on labeled data and is the first of its kind to generate instances and annotations from text. This allows for a fully automated pipeline, eliminating the need for manual effort in practical applications. Additionally, Text3DAug is sensor agnostic and can be applied regardless of the LiDAR sensor used. Comprehensive experimental analysis on LiDAR segmentation, detection and novel class discovery demonstrates that Text3DAug is effective in supplementing existing methods or as a standalone method, performing on par or better than established methods, however while overcoming their specific drawbacks. The code is publicly available.
Paper Structure (16 sections, 4 figures, 6 tables, 1 algorithm)

This paper contains 16 sections, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: The Text3DAug augmentation pipeline. We prompt our instance generation engine in order to create and annotate meshes for desired classes. These are then realistically placed and rendered in LiDAR point clouds as instances.
  • Figure 2: Our instance generation engine prompts text-to-3D models to generate mesh models. Annotations are derived from the mesh and CLIP scoring is used as a measure of quality. These are added to a database which will be used for the augmentation of LiDAR scans. The shown mesh model in this figure was generated by Shap-E Shape-E.
  • Figure 3: Impact of removing points and adding noise on Text3DAug instances on SemanticKITTI semantic-kitti semantic segmentation performance.
  • Figure 4: We assess the impact of quality versus quantity on our pipeline, filtering the meshes by CLIP score and comparing results for the segmentation mIoU on SemanticKITTI semantic-kitti.