Table of Contents
Fetching ...

FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking

Sifan Zhou, Jiahao Nie, Ziyu Zhao, Yichao Cao, Xiaobo Lu

TL;DR

FocusTrack is proposed, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention.

Abstract

In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bottlenecks from sequential processing. To address these challenges, we propose FocusTrack, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention. The IMM module employs a temp-oral-difference siamese encoder to capture global motion patterns between adjacent frames. The Focus-and-Suppress attention that enhance the foreground semantics via motion-salient feature gating and suppress the background noise based on the temporal-aware motion context from IMM without explicit segmentation. Based on above two designs, FocusTrack enables end-to-end training with compact one-stage pipeline. Extensive experiments on prominent 3D tracking benchmarks, such as KITTI, nuScenes, and Waymo, demonstrate that the FocusTrack achieves new SOTA performance while running at a high speed with 105 FPS.

FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking

TL;DR

FocusTrack is proposed, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention.

Abstract

In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bottlenecks from sequential processing. To address these challenges, we propose FocusTrack, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention. The IMM module employs a temp-oral-difference siamese encoder to capture global motion patterns between adjacent frames. The Focus-and-Suppress attention that enhance the foreground semantics via motion-salient feature gating and suppress the background noise based on the temporal-aware motion context from IMM without explicit segmentation. Based on above two designs, FocusTrack enables end-to-end training with compact one-stage pipeline. Extensive experiments on prominent 3D tracking benchmarks, such as KITTI, nuScenes, and Waymo, demonstrate that the FocusTrack achieves new SOTA performance while running at a high speed with 105 FPS.
Paper Structure (40 sections, 10 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 40 sections, 10 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison between two-stage motion-based tracking methods (a) and our two-stage motion-based tracking method (b). The two-stage motion-based methods M2Track m2trackm2track++ series models motion relation for tracking in a two-stage manner. In contrast, our FocusTrack explores unified motion-foreground modeling for one-stage tracking.
  • Figure 2: Comparison with state-of-the-art methods. We visualize mean precision across all categories on KITTI dataset kitti with respect to running speed (FPS). FocusTrack shows the new SOTA in terms of both accuracy and speed.
  • Figure 3: One-stage focus-and-suppress framework (FocusTrack): This framework consists of a backbone that capture inter-frame differences for motion modeling. Then, it features a Focus-and-Suppress attention mechanism designed to suppress background noise while enhancing foreground semantics based on motion weights.
  • Figure 4: The BEV feature visualization of FocusTrack.
  • Figure 5: Visualization of tracking results compared with state-of-the-art motion-based M2Track m2track method.
  • ...and 1 more figures