Text3DAug -- Prompted Instance Augmentation for LiDAR Perception
Laurenz Reichardt, Luca Uhr, Oliver Wasenmüller
TL;DR
Text3DAug tackles LiDAR data heterogeneity and class imbalance by introducing a fully automated, label-free augmentation pipeline that generates and places text-informed 3D meshes into LiDAR scenes. It uses a fixed prompting recipe to create meshes, evaluates them with CLIP, and renders realistic placements with remission-aware shading and ray casting, independent of the original dataset labels. Across SemanticKITTI, KITTI, and NuScenes, Text3DAug improves segmentation and detection performance, can outperform or match GT-based augmentation in many scenarios, and enables novel class discovery without labels. The approach is sensor-agnostic, scalable, and modular, with public code and potential extensions to other sensors and generative models.
Abstract
LiDAR data of urban scenarios poses unique challenges, such as heterogeneous characteristics and inherent class imbalance. Therefore, large-scale datasets are necessary to apply deep learning methods. Instance augmentation has emerged as an efficient method to increase dataset diversity. However, current methods require the time-consuming curation of 3D models or costly manual data annotation. To overcome these limitations, we propose Text3DAug, a novel approach leveraging generative models for instance augmentation. Text3DAug does not depend on labeled data and is the first of its kind to generate instances and annotations from text. This allows for a fully automated pipeline, eliminating the need for manual effort in practical applications. Additionally, Text3DAug is sensor agnostic and can be applied regardless of the LiDAR sensor used. Comprehensive experimental analysis on LiDAR segmentation, detection and novel class discovery demonstrates that Text3DAug is effective in supplementing existing methods or as a standalone method, performing on par or better than established methods, however while overcoming their specific drawbacks. The code is publicly available.
