Table of Contents
Fetching ...

Offline-Poly: A Polyhedral Framework For Offline 3D Multi-Object Tracking

Xiaoyu Li, Yitao Wu, Xian Wu, Haolin Zhuo, Lijun Zhao, Lining Sun

TL;DR

The proposed Offline-Poly, a general offline 3D MOT method based on a tracking-centric design, introduces a standardized paradigm termed Tracking-by-Tracking (TBT), which operates exclusively on arbitrary off-the-shelf tracking outputs and produces offline-refined tracklets.

Abstract

Offline 3D multi-object tracking (MOT) is a critical component of the 4D auto-labeling (4DAL) process. It enhances pseudo-labels generated by high-performance detectors through the incorporation of temporal context. However, existing offline 3D MOT approaches are direct extensions of online frameworks and fail to fully exploit the advantages of offline setting. Moreover, these methods often depend on fixed upstream and customized architectures, limiting their adaptability. To address these limitations, we propose Offline-Poly, a general offline 3D MOT method based on a tracking-centric design. We introduce a standardized paradigm termed Tracking-by-Tracking (TBT), which operates exclusively on arbitrary off-the-shelf tracking outputs and produces offline-refined tracklets. This formulation decouples offline tracker from specific upstream detectors or trackers. Under the TBT paradigm, Offline-Poly accepts one or multiple coarse tracking results and processes them through a structured pipeline comprising pre-processing, hierarchical matching and fusion, and tracklet refinement. Each module is designed to capitalize on the two fundamental properties of offline tracking: resource unconstrainedness, which permits global optimization beyond real-time limits, and future observability, which enables tracklet reasoning over the full temporal horizon. Offline-Poly first eliminates short-term ghost tracklets and re-identifies fragmented segments using global scene context. It then constructs scene-level similarity to associate tracklets across multiple input sources. Finally, Offline-Poly refines tracklets by jointly leveraging local and global motion patterns. On nuScenes, we achieve SOTA performance with 77.6% AMOTA. On KITTI, it achieves leading results with 83.00% HOTA. Comprehensive experiments further validate the flexibility, generalizability, and modular effectiveness of Offline-Poly.

Offline-Poly: A Polyhedral Framework For Offline 3D Multi-Object Tracking

TL;DR

The proposed Offline-Poly, a general offline 3D MOT method based on a tracking-centric design, introduces a standardized paradigm termed Tracking-by-Tracking (TBT), which operates exclusively on arbitrary off-the-shelf tracking outputs and produces offline-refined tracklets.

Abstract

Offline 3D multi-object tracking (MOT) is a critical component of the 4D auto-labeling (4DAL) process. It enhances pseudo-labels generated by high-performance detectors through the incorporation of temporal context. However, existing offline 3D MOT approaches are direct extensions of online frameworks and fail to fully exploit the advantages of offline setting. Moreover, these methods often depend on fixed upstream and customized architectures, limiting their adaptability. To address these limitations, we propose Offline-Poly, a general offline 3D MOT method based on a tracking-centric design. We introduce a standardized paradigm termed Tracking-by-Tracking (TBT), which operates exclusively on arbitrary off-the-shelf tracking outputs and produces offline-refined tracklets. This formulation decouples offline tracker from specific upstream detectors or trackers. Under the TBT paradigm, Offline-Poly accepts one or multiple coarse tracking results and processes them through a structured pipeline comprising pre-processing, hierarchical matching and fusion, and tracklet refinement. Each module is designed to capitalize on the two fundamental properties of offline tracking: resource unconstrainedness, which permits global optimization beyond real-time limits, and future observability, which enables tracklet reasoning over the full temporal horizon. Offline-Poly first eliminates short-term ghost tracklets and re-identifies fragmented segments using global scene context. It then constructs scene-level similarity to associate tracklets across multiple input sources. Finally, Offline-Poly refines tracklets by jointly leveraging local and global motion patterns. On nuScenes, we achieve SOTA performance with 77.6% AMOTA. On KITTI, it achieves leading results with 83.00% HOTA. Comprehensive experiments further validate the flexibility, generalizability, and modular effectiveness of Offline-Poly.
Paper Structure (19 sections, 10 equations, 8 figures, 9 tables)

This paper contains 19 sections, 10 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: The comparison between online and offline 3D MOT. Both methods produce object trajectories, but offline 3D MOT can exploit the complete scene information and is not constrained by real-time requirements.
  • Figure 2: The pipeline of our proposed method. Offline-Poly refines coarse tracking results into optimized trajectories through four stages: pre-processing, matching, fusion, and refinement. The detailed architecture is provided in \ref{['Overall_Architecture']}.
  • Figure 3: The pipeline of single tracker matching and fusion for tracklets without overlapping lifecycles (STWO). Offline-Poly introduces a motion-based matching and fusion framework that re-identifies tracklet fragments belonging to the same object through iterative frame-by-frame processing.
  • Figure 4: The pipeline of single matching and fusion for tracklets with overlapping lifecycles (STW). It matches tracklets with high geometric similarity, then reorganizes and disentangles the distinct objects embedded within a single tracklet. We present the process for the $i$-th cluster.
  • Figure 5: The pipeline of hierarchical matching and fusion (multiple trackers). Offline-Poly integrates cross-tracker observations of the same object by constructing scene-level tracklet similarity, producing more complete and consistent tracklet representations.
  • ...and 3 more figures