Table of Contents
Fetching ...

Interactive4D: Interactive 4D LiDAR Segmentation

Ilya Fradlin, Idil Esen Zulfikar, Kadir Yilmaz, Theodora Kontogianni, Bastian Leibe

TL;DR

Interactive4D addresses the high cost of annotating outdoor LiDAR data by enabling simultaneous, multi-object segmentation over a short window of superimposed scans, producing consistent instance IDs suitable for tracking. The method constructs a 4D spatio-temporal volume, using a per-voxel feature extractor, an identity-aware click encoder, iterative refinement, and a click fusion mechanism, trained with a novel scale-invariant and region-centered click strategy. Experiments on SemanticKITTI and zero-shot nuScenes show state-of-the-art performance across single-object, multi-object, and 4D interactive setups, with a user study validating real-world applicability. The work significantly reduces annotation effort for LiDAR datasets and provides a foundation for robust 4D labeling and tracking in autonomous driving pipelines.

Abstract

Interactive segmentation has an important role in facilitating the annotation process of future LiDAR datasets. Existing approaches sequentially segment individual objects at each LiDAR scan, repeating the process throughout the entire sequence, which is redundant and ineffective. In this work, we propose interactive 4D segmentation, a new paradigm that allows segmenting multiple objects on multiple LiDAR scans simultaneously, and Interactive4D, the first interactive 4D segmentation model that segments multiple objects on superimposed consecutive LiDAR scans in a single iteration by utilizing the sequential nature of LiDAR data. While performing interactive segmentation, our model leverages the entire space-time volume, leading to more efficient segmentation. Operating on the 4D volume, it directly provides consistent instance IDs over time and also simplifies tracking annotations. Moreover, we show that click simulations are crucial for successful model training on LiDAR point clouds. To this end, we design a click simulation strategy that is better suited for the characteristics of LiDAR data. To demonstrate its accuracy and effectiveness, we evaluate Interactive4D on multiple LiDAR datasets, where Interactive4D achieves a new state-of-the-art by a large margin. We publicly release the code and models at https://vision.rwth-aachen.de/Interactive4D.

Interactive4D: Interactive 4D LiDAR Segmentation

TL;DR

Interactive4D addresses the high cost of annotating outdoor LiDAR data by enabling simultaneous, multi-object segmentation over a short window of superimposed scans, producing consistent instance IDs suitable for tracking. The method constructs a 4D spatio-temporal volume, using a per-voxel feature extractor, an identity-aware click encoder, iterative refinement, and a click fusion mechanism, trained with a novel scale-invariant and region-centered click strategy. Experiments on SemanticKITTI and zero-shot nuScenes show state-of-the-art performance across single-object, multi-object, and 4D interactive setups, with a user study validating real-world applicability. The work significantly reduces annotation effort for LiDAR datasets and provides a foundation for robust 4D labeling and tracking in autonomous driving pipelines.

Abstract

Interactive segmentation has an important role in facilitating the annotation process of future LiDAR datasets. Existing approaches sequentially segment individual objects at each LiDAR scan, repeating the process throughout the entire sequence, which is redundant and ineffective. In this work, we propose interactive 4D segmentation, a new paradigm that allows segmenting multiple objects on multiple LiDAR scans simultaneously, and Interactive4D, the first interactive 4D segmentation model that segments multiple objects on superimposed consecutive LiDAR scans in a single iteration by utilizing the sequential nature of LiDAR data. While performing interactive segmentation, our model leverages the entire space-time volume, leading to more efficient segmentation. Operating on the 4D volume, it directly provides consistent instance IDs over time and also simplifies tracking annotations. Moreover, we show that click simulations are crucial for successful model training on LiDAR point clouds. To this end, we design a click simulation strategy that is better suited for the characteristics of LiDAR data. To demonstrate its accuracy and effectiveness, we evaluate Interactive4D on multiple LiDAR datasets, where Interactive4D achieves a new state-of-the-art by a large margin. We publicly release the code and models at https://vision.rwth-aachen.de/Interactive4D.

Paper Structure

This paper contains 28 sections, 8 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Left: Current interactive LiDAR segmentation methods Sun2023ACIHan2024ClickFormer segment each object and each LiDAR scan individually, which is sub-optimal. Right: In contrast, Interactive4D segments multiple objects on superimposed consecutive LiDAR scans at once, significantly improving efficiency, while providing consistent instance IDs over time.
  • Figure 2: Overview.: We superimpose consecutive scans into a single point cloud and extract per-voxel features (executed once). : The clicks are encoded as initial queries, then refined through multiple attention layers. : The dot product between refined queries and voxel features results in click responses, which are fused in click fusion module to form predictions .
  • Figure 3: Examples for centroid clicking and scale-invariant clicking.
  • Figure 4: Number of Superimposed Scans Ablation.
  • Figure 5: Example results of Interactive4D on SemanticKITTI.
  • ...and 4 more figures