PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
Weisheng Xu, Sifan Zhou, Jiaqi Xiong, Ziyu Zhao, Zhihang Yuan
TL;DR
PillarTrack addresses the information loss in point-based LiDAR 3D SOT by introducing a pillar-based representation that preserves geometry while enabling real-time processing. It pair s a Pyramid-Encoded Pillar Feature Encoder (PE-PFE) with a modality-aware Transformer backbone to enhance pillar features and efficiently capture geometric cues, reorienting computation toward early stages to leverage intrinsic point-cloud structure. The approach yields strong gains on KITTI and competitive performance on nuScenes, outperforming the baseline SMAT and many motion- or similarity-based trackers while delivering higher FPS. The work provides an open-source implementation and emphasizes practical deployment on resource-constrained platforms through reduced GFLOPs and potential quantization. Overall, PillarTrack offers a robust, efficient path for 3D SOT on point clouds with actionable design principles for backbone and feature encoding in pillar-based pipelines.
Abstract
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. Existing 3D SOT methods typically adhere to a point-based processing pipeline, wherein the re-sampling operation invariably leads to either redundant or missing information, thereby impacting performance. To address these issues, we propose PillarTrack, a novel pillar-based 3D SOT framework. First, we transform sparse point clouds into dense pillars to preserve the local and global geometrics. Second, we propose a Pyramid-Encoded Pillar Feature Encoder (PE-PFE) design to enhance the robustness of pillar feature for translation/rotation/scale. Third, we present an efficient Transformer-based backbone from the perspective of modality differences. Finally, we construct our PillarTrack based on above designs. Extensive experiments show that our method achieves comparable performance on the KITTI and NuScenes datasets, significantly enhancing the performance of the baseline.
