ATPPNet: Attention based Temporal Point cloud Prediction Network
Kaustab Pal, Aditya Sharma, Avinash Sharma, K. Madhava Krishna
TL;DR
ATPPNet addresses the challenging task of predicting future LiDAR point clouds from past sequences by fusing Conv-LSTM-based spatio-temporal modeling with both spatial and channel-wise attention, plus a 3D-CNN branch to capture global context. The architecture yields future range images and reprojection masks that enable high-fidelity point-cloud reconstruction, trained in a self-supervised manner and evaluated on KITTI and nuScenes where it achieves state-of-the-art range loss and Chamfer-distance improvements and real-time inference. Ablation studies confirm the value of attention, temporal modeling depth, and the 3D-CNN component, while downstream odometry experiments demonstrate practical gains in ego-motion estimation. The approach promises improved perception and localization for autonomous navigation tasks, with potential for active localization strategies to exploit regions of low drift.
Abstract
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry drift. In this work, we present ATPPNet, a novel architecture that predicts future point cloud sequences given a sequence of previous time step point clouds obtained with LiDAR sensor. ATPPNet leverages Conv-LSTM along with channel-wise and spatial attention dually complemented by a 3D-CNN branch for extracting an enhanced spatio-temporal context to recover high quality fidel predictions of future point clouds. We conduct extensive experiments on publicly available datasets and report impressive performance outperforming the existing methods. We also conduct a thorough ablative study of the proposed architecture and provide an application study that highlights the potential of our model for tasks like odometry estimation.
