DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

Huixin Zhang; Guangming Wang; Xinrui Wu; Chenfeng Xu; Mingyu Ding; Masayoshi Tomizuka; Wei Zhan; Hesheng Wang

DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

Huixin Zhang, Guangming Wang, Xinrui Wu, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang

TL;DR

DSLO tackles LiDAR odometry for unstructured 3D point clouds by introducing inconsistent spatio-temporal propagation. It combines a multi-scale spatial feature pyramid with spatial information reuse, sequential pose initialization, gated hierarchical pose refinement, and temporal feature propagation to fuse history across frames. The method achieves state-of-the-art performance on KITTI and Argoverse, with improvements in RTE and RRE and substantial runtime reductions, enabling real-time operation on consumer GPUs. Ablation studies confirm the contributions of each component, demonstrating robustness to sensor noise and occlusions while maintaining efficiency.

Abstract

This paper introduces a 3D point cloud sequence learning model based on inconsistent spatio-temporal propagation for LiDAR odometry, termed DSLO. It consists of a pyramid structure with a spatial information reuse strategy, a sequential pose initialization module, a gated hierarchical pose refinement module, and a temporal feature propagation module. First, spatial features are encoded using a point feature pyramid, with features reused in successive pose estimations to reduce computational overhead. Second, a sequential pose initialization method is introduced, leveraging the high-frequency sampling characteristic of LiDAR to initialize the LiDAR pose. Then, a gated hierarchical pose refinement mechanism refines poses from coarse to fine by selectively retaining or discarding motion information from different layers based on gate estimations. Finally, temporal feature propagation is proposed to incorporate the historical motion information from point cloud sequences, and address the spatial inconsistency issue when transmitting motion information embedded in point clouds between frames. Experimental results on the KITTI odometry dataset and Argoverse dataset demonstrate that DSLO outperforms state-of-the-art methods, achieving at least a 15.67\% improvement on RTE and a 12.64\% improvement on RRE, while also achieving a 34.69\% reduction in runtime compared to baseline methods. Our implementation will be available at https://github.com/IRMVLab/DSLO.

DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

TL;DR

Abstract

Paper Structure (21 sections, 15 equations, 5 figures, 4 tables)

This paper contains 21 sections, 15 equations, 5 figures, 4 tables.

Introduction
Related Work
Deep LiDAR Odometry
Spatio-temporal Fusion on 3D Point Cloud Learning
Methodology
Spatial Information Reuse
Sequential Pose Initialization
Gated Hierarchical Pose Refinement
Temporal Feature Propagation with Inconsistent Spatial Context
Training Loss
Experiments
Implementation Details
Accuracy Evaluation
Experiments on KITTI odometry dataset
Experiments on Argoverse dataset
...and 6 more sections

Figures (5)

Figure 1: Inspiration for our work. Boxes of the same color indicate the same rigid object. The robot's motion can be inferred from these objects, showing high similarity between adjacent frames.
Figure 2: Overview of our DSLO. For pose estimation between two adjacent frames, we encode the point feature pyramid and use a gated hierarchical pose refinement module to achieve coarse-to-fine update. For multiple frames, we reuse the feature pyramid and utilize the last refined pose as the current initial guess. Temporal feature propagation fuses motion features along time series and addresses the spatial inconsistency of point-wise features between frames.
Figure 3: Gated hierarchical pose refinement module. The residual embedding feature ${RE}_t^l$, point pyramid feature $F_t^l$ and upsampled embedding feature $CE^{l}$ are encoded and fed into a GRU. Subsequently, the output embedding feature ${E}_t^l$ assists in refining embedding mask ${M}_t^l$ and pose ${q}_t^l$, ${t}_t^l$.
Figure 4: Temporal feature propagation with inconsistent spatial context. The historical motion information $E_{t-1}$ and LSTM cell state $c_{t-1}$ embedded in ${PC}_{t-1}^L$ are passed to $PC_t^L$ with a learning-based motion information relay method. Then peephole LSTM is employed to fuse the temporal motion information.
Figure 5: 3D trajectory results on KITTI Seq. 07-10.

DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

TL;DR

Abstract

DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)