Table of Contents
Fetching ...

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

Sifan Zhou, Yichao Cao, Jiahao Nie, Yuqian Fu, Ziyu Zhao, Xiaobo Lu, Shuo Wang

TL;DR

CompTrack tackles the sparsity of LiDAR point clouds in 3D SOT by addressing two forms of redundancy: spatial background noise and informational redundancy in foreground geometry. It introduces a Spatial Foreground Predictor to suppress background and an Information Bottleneck-guided Dynamic Token Compression that uses online SVD and learnable queries to distill foreground into a compact, high-information proxy token set, enabling accurate tracking with a high throughput of 90 FPS. The method achieves state-of-the-art results on nuScenes and Waymo and competitive performance on KITTI, validated by extensive ablations. The approach offers a practical, end-to-end framework for real-time autonomous driving scenarios, balancing precision and latency through principled rank-aware token compression.

Abstract

3D single object tracking (SOT) in LiDAR point clouds is a critical task in computer vision and autonomous driving. Despite great success having been achieved, the inherent sparsity of point clouds introduces a dual-redundancy challenge that limits existing trackers: (1) vast spatial redundancy from background noise impairs accuracy, and (2) informational redundancy within the foreground hinders efficiency. To tackle these issues, we propose CompTrack, a novel end-to-end framework that systematically eliminates both forms of redundancy in point clouds. First, CompTrack incorporates a Spatial Foreground Predictor (SFP) module to filter out irrelevant background noise based on information entropy, addressing spatial redundancy. Subsequently, its core is an Information Bottleneck-guided Dynamic Token Compression (IB-DTC) module that eliminates the informational redundancy within the foreground. Theoretically grounded in low-rank approximation, this module leverages an online SVD analysis to adaptively compress the redundant foreground into a compact and highly informative set of proxy tokens. Extensive experiments on KITTI, nuScenes and Waymo datasets demonstrate that CompTrack achieves top-performing tracking performance with superior efficiency, running at a real-time 90 FPS on a single RTX 3090 GPU.

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

TL;DR

CompTrack tackles the sparsity of LiDAR point clouds in 3D SOT by addressing two forms of redundancy: spatial background noise and informational redundancy in foreground geometry. It introduces a Spatial Foreground Predictor to suppress background and an Information Bottleneck-guided Dynamic Token Compression that uses online SVD and learnable queries to distill foreground into a compact, high-information proxy token set, enabling accurate tracking with a high throughput of 90 FPS. The method achieves state-of-the-art results on nuScenes and Waymo and competitive performance on KITTI, validated by extensive ablations. The approach offers a practical, end-to-end framework for real-time autonomous driving scenarios, balancing precision and latency through principled rank-aware token compression.

Abstract

3D single object tracking (SOT) in LiDAR point clouds is a critical task in computer vision and autonomous driving. Despite great success having been achieved, the inherent sparsity of point clouds introduces a dual-redundancy challenge that limits existing trackers: (1) vast spatial redundancy from background noise impairs accuracy, and (2) informational redundancy within the foreground hinders efficiency. To tackle these issues, we propose CompTrack, a novel end-to-end framework that systematically eliminates both forms of redundancy in point clouds. First, CompTrack incorporates a Spatial Foreground Predictor (SFP) module to filter out irrelevant background noise based on information entropy, addressing spatial redundancy. Subsequently, its core is an Information Bottleneck-guided Dynamic Token Compression (IB-DTC) module that eliminates the informational redundancy within the foreground. Theoretically grounded in low-rank approximation, this module leverages an online SVD analysis to adaptively compress the redundant foreground into a compact and highly informative set of proxy tokens. Extensive experiments on KITTI, nuScenes and Waymo datasets demonstrate that CompTrack achieves top-performing tracking performance with superior efficiency, running at a real-time 90 FPS on a single RTX 3090 GPU.

Paper Structure

This paper contains 28 sections, 7 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) The inherent sparsity of point clouds introduces dual challenges: spatial redundancy from irrelevant background and informational redundancy from repetitive geometries in foreground. (b) Existing SOT methods overlook the information redundancy, which limits their efficiency. (c) Our proposed CompTrack framework tackles both spatial and informational redundancy.
  • Figure 2: Overall architecture of our proposed CompTrack. It consists of two main designs: (1)Spatial Foreground Predictor (SFP) that filters irrelevant background to decrease the spatial redundancy, and (2) Information Bottleneck-guided Low-rank Dynamic Token Compression (IB-DTC) module that compresses the foreground into a more compact, low-rank representation.
  • Figure 3: Illustration of the spatial foreground predictor (SFP). SPF removes the spatial redundancy by filtering irrelevant background information.
  • Figure 4: Illustration of the proposed information bottleneck-guided dynamic token compression.
  • Figure 5: The visualization of tracking and feature maps.