Table of Contents
Fetching ...

RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

Zhaoxuan Wang, Xu Han, Hongxin Liu, Xianzhi Li

TL;DR

RIDE addresses the challenge of rotation robustness in LiDAR-based 3D object detection by introducing rotation-invariant features (RIFs) learned through a rotation-invariance block (RIB) and integrated into a bi-feature extractor (Bi-SA) that preserves object-aware information. The method yields a plug-and-play module compatible with both one-stage and two-stage detectors, producing rotation-robust proposals by fusing rotation-invariant and object-aware representations. Empirical results on KITTI and nuScenes show consistent improvements in mAP and rotation robustness under arbitrary rotations, with notable gains in AR scenarios and competitive inference speed. This approach offers a practical path to stable, rotation-agnostic 3D perception in autonomous driving without restricting rotation angles or sacrificing existing detector performance.

Abstract

The rotation robustness property has drawn much attention to point cloud analysis, whereas it still poses a critical challenge in 3D object detection. When subjected to arbitrary rotation, most existing detectors fail to produce expected outputs due to the poor rotation robustness. In this paper, we present RIDE, a pioneering exploration of Rotation-Invariance for the 3D LiDAR-point-based object DEtector, with the key idea of designing rotation-invariant features from LiDAR scenes and then effectively incorporating them into existing 3D detectors. Specifically, we design a bi-feature extractor that extracts (i) object-aware features though sensitive to rotation but preserve geometry well, and (ii) rotation-invariant features, which lose geometric information to a certain extent but are robust to rotation. These two kinds of features complement each other to decode 3D proposals that are robust to arbitrary rotations. Particularly, our RIDE is compatible and easy to plug into the existing one-stage and two-stage 3D detectors, and boosts both detection performance and rotation robustness. Extensive experiments on the standard benchmarks showcase that the mean average precision (mAP) and rotation robustness can be significantly boosted by integrating with our RIDE, with +5.6% mAP and 53% rotation robustness improvement on KITTI, +5.1% and 28% improvement correspondingly on nuScenes. The code will be available soon.

RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

TL;DR

RIDE addresses the challenge of rotation robustness in LiDAR-based 3D object detection by introducing rotation-invariant features (RIFs) learned through a rotation-invariance block (RIB) and integrated into a bi-feature extractor (Bi-SA) that preserves object-aware information. The method yields a plug-and-play module compatible with both one-stage and two-stage detectors, producing rotation-robust proposals by fusing rotation-invariant and object-aware representations. Empirical results on KITTI and nuScenes show consistent improvements in mAP and rotation robustness under arbitrary rotations, with notable gains in AR scenarios and competitive inference speed. This approach offers a practical path to stable, rotation-agnostic 3D perception in autonomous driving without restricting rotation angles or sacrificing existing detector performance.

Abstract

The rotation robustness property has drawn much attention to point cloud analysis, whereas it still poses a critical challenge in 3D object detection. When subjected to arbitrary rotation, most existing detectors fail to produce expected outputs due to the poor rotation robustness. In this paper, we present RIDE, a pioneering exploration of Rotation-Invariance for the 3D LiDAR-point-based object DEtector, with the key idea of designing rotation-invariant features from LiDAR scenes and then effectively incorporating them into existing 3D detectors. Specifically, we design a bi-feature extractor that extracts (i) object-aware features though sensitive to rotation but preserve geometry well, and (ii) rotation-invariant features, which lose geometric information to a certain extent but are robust to rotation. These two kinds of features complement each other to decode 3D proposals that are robust to arbitrary rotations. Particularly, our RIDE is compatible and easy to plug into the existing one-stage and two-stage 3D detectors, and boosts both detection performance and rotation robustness. Extensive experiments on the standard benchmarks showcase that the mean average precision (mAP) and rotation robustness can be significantly boosted by integrating with our RIDE, with +5.6% mAP and 53% rotation robustness improvement on KITTI, +5.1% and 28% improvement correspondingly on nuScenes. The code will be available soon.
Paper Structure (28 sections, 6 equations, 5 figures, 6 tables)

This paper contains 28 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Compare the detection performance, rotation robustness and inference speed between the original 3D detectors (i.e., 3DSSD yang20203dssd and IA-SSD zhang2022not) and the associated ones equipped with our RIDE. Clearly, our RIDE further boosts both perception precision and rotation robustness with an acceptable decrease in speed; see the red and blue regions.
  • Figure 2: Given the existing point-based detectors consisting of an object-aware feature extractor, a spatial aggregation layer, and a detection head, we further design a bi-feature extractor by incorporating a novel rotation-invariant feature extractor with rotation-invariance block (RIB), thus making existing detectors rotation-robust.
  • Figure 3: Based on the rotation-invariant feature design (see (a)) proposed in zhang2019rotation, (b) we first consider the local structure of a reference point $p_i$ by including its neighbor $p_{ij}$, and then (c) further introduce more extra elements to eliminate ambiguities. Note, $p_m$ and $p_q$ denote the ball center and the geometric barycenter point of $p_i$'s query ball.
  • Figure 4: The detail of extracting RIF representation $F_r^l$ at $l$-th bi-set abstraction (Bi-SA) layer. Meanwhile, the OAF representation $F_o^l$ can be acquired by replacing RIB with MM (MLPs + Max pooling) module. Therefore, we can parallel RIBs and MMs to obtain both $F_r$ and $F_o$ at one layer.
  • Figure 5: Qualitative comparisons between our RIDE-IA-SSD with IA-SSD zhang2022not against the ground truth (GT). The yellow arrows denote the distinct differences.