Table of Contents
Fetching ...

SEGT: A General Spatial Expansion Group Transformer for nuScenes Lidar-based Object Detection Task

Cheng Mei, Hao He, Yahui Liu, Zhenhua Guo

TL;DR

The paper tackles lidar-based 3D object detection in the nuScenes dataset, where point clouds are sparse and irregular. It introduces SEGT, a general spatial expansion group transformer that migrates voxel features into specialized ordered fields using conjugate Hilbert expansion strategies and applies group attention to extract exclusive field-level features, followed by cross-field fusion. On nuScenes test data, SEGT achieves a mean detection score of $NDS=73.9$ (without TTA) and $NDS=74.5$ (with TTA), ranking 1st in the lidar-based task. The approach enhances robustness to varying point densities and reduces computation via targeted group attention and structured field expansions, offering a scalable transformer-based solution for LiDAR perception.

Abstract

In the technical report, we present a novel transformer-based framework for nuScenes lidar-based object detection task, termed Spatial Expansion Group Transformer (SEGT). To efficiently handle the irregular and sparse nature of point cloud, we propose migrating the voxels into distinct specialized ordered fields with the general spatial expansion strategies, and employ group attention mechanisms to extract the exclusive feature maps within each field. Subsequently, we integrate the feature representations across different ordered fields by alternately applying diverse expansion strategies, thereby enhancing the model's ability to capture comprehensive spatial information. The method was evaluated on the nuScenes lidar-based object detection test dataset, achieving an NDS score of 73.9 without Test-Time Augmentation (TTA) and 74.5 with TTA, demonstrating the effectiveness of the proposed method. Notably, our method ranks the 1st place in the nuScenes lidar-based object detection task.

SEGT: A General Spatial Expansion Group Transformer for nuScenes Lidar-based Object Detection Task

TL;DR

The paper tackles lidar-based 3D object detection in the nuScenes dataset, where point clouds are sparse and irregular. It introduces SEGT, a general spatial expansion group transformer that migrates voxel features into specialized ordered fields using conjugate Hilbert expansion strategies and applies group attention to extract exclusive field-level features, followed by cross-field fusion. On nuScenes test data, SEGT achieves a mean detection score of (without TTA) and (with TTA), ranking 1st in the lidar-based task. The approach enhances robustness to varying point densities and reduces computation via targeted group attention and structured field expansions, offering a scalable transformer-based solution for LiDAR perception.

Abstract

In the technical report, we present a novel transformer-based framework for nuScenes lidar-based object detection task, termed Spatial Expansion Group Transformer (SEGT). To efficiently handle the irregular and sparse nature of point cloud, we propose migrating the voxels into distinct specialized ordered fields with the general spatial expansion strategies, and employ group attention mechanisms to extract the exclusive feature maps within each field. Subsequently, we integrate the feature representations across different ordered fields by alternately applying diverse expansion strategies, thereby enhancing the model's ability to capture comprehensive spatial information. The method was evaluated on the nuScenes lidar-based object detection test dataset, achieving an NDS score of 73.9 without Test-Time Augmentation (TTA) and 74.5 with TTA, demonstrating the effectiveness of the proposed method. Notably, our method ranks the 1st place in the nuScenes lidar-based object detection task.

Paper Structure

This paper contains 11 sections, 1 equation, 2 tables.