Table of Contents
Fetching ...

SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking

Sandro Papais, Robert Ren, Steven Waslander

TL;DR

SWTrack tackles robust 3D multi-object tracking in dense, dynamic environments by solving data association over a temporal sliding window. It casts the problem as a sparse, lifted-edge graph optimization and uses a four-term log-likelihood to score track hypotheses, enabling online pruning and an LP-relaxed solution that preserves integral optima. The method demonstrates improved AMOTA on NuScenes compared with state-of-the-art single-frame trackers, particularly for difficult classes, and shows real-time performance with modest window lengths. This approach advances real-time, high-accuracy 3D MOT by effectively leveraging temporal context and multi-hypothesis reasoning across frames.

Abstract

Modern robotic systems are required to operate in dense dynamic environments, requiring highly accurate real-time track identification and estimation. For 3D multi-object tracking, recent approaches process a single measurement frame recursively with greedy association and are prone to errors in ambiguous association decisions. Our method, Sliding Window Tracker (SWTrack), yields more accurate association and state estimation by batch processing many frames of sensor data while being capable of running online in real-time. The most probable track associations are identified by evaluating all possible track hypotheses across the temporal sliding window. A novel graph optimization approach is formulated to solve the multidimensional assignment problem with lifted graph edges introduced to account for missed detections and graph sparsity enforced to retain real-time efficiency. We evaluate our SWTrack implementation$^{2}$ on the NuScenes autonomous driving dataset to demonstrate improved tracking performance.

SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking

TL;DR

SWTrack tackles robust 3D multi-object tracking in dense, dynamic environments by solving data association over a temporal sliding window. It casts the problem as a sparse, lifted-edge graph optimization and uses a four-term log-likelihood to score track hypotheses, enabling online pruning and an LP-relaxed solution that preserves integral optima. The method demonstrates improved AMOTA on NuScenes compared with state-of-the-art single-frame trackers, particularly for difficult classes, and shows real-time performance with modest window lengths. This approach advances real-time, high-accuracy 3D MOT by effectively leveraging temporal context and multi-hypothesis reasoning across frames.

Abstract

Modern robotic systems are required to operate in dense dynamic environments, requiring highly accurate real-time track identification and estimation. For 3D multi-object tracking, recent approaches process a single measurement frame recursively with greedy association and are prone to errors in ambiguous association decisions. Our method, Sliding Window Tracker (SWTrack), yields more accurate association and state estimation by batch processing many frames of sensor data while being capable of running online in real-time. The most probable track associations are identified by evaluating all possible track hypotheses across the temporal sliding window. A novel graph optimization approach is formulated to solve the multidimensional assignment problem with lifted graph edges introduced to account for missed detections and graph sparsity enforced to retain real-time efficiency. We evaluate our SWTrack implementation on the NuScenes autonomous driving dataset to demonstrate improved tracking performance.
Paper Structure (10 sections, 24 equations, 3 figures, 3 tables)

This paper contains 10 sections, 24 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An example illustration for multiple hypothesis tracking with noisy observations from 3 objects. Observations of objects are shown as blue boxes, association decision variables are blue lines, tracked objects are red boxes, and track histories are red lines. Our sliding window tracker solves an association problem over many frames and can rectify decisions made in past frames with more recent information. The association is solved by optimizing over a multidimensional graph structure to obtain 3D trajectories.
  • Figure 2: Sliding window tracker architecture. (a) Measurement data is passed through a 3D encoder to extract feature embeddings and generate detections. (b) Detected objects are used to expand the sparse graph nodes and find new edges with lifted edges shown as blue lines. (c) Track hypotheses and likelihoods are drawn from the graph and solved by linear programming. (d) Tracks are assigned IDs and filtered to obtain the tracking solution.
  • Figure 3: Qualitative tracking comparison with a horizon of two frames (top) and four frames (bottom) on the NuScenes validation set. In the first column, a group of 5 pedestrians are being tracked. In the second column, 2 seconds later, the tracks are lost due to occlusion. In the third column, 2 seconds later, the objects reappear and the sliding window tracker is able to recover the full track history while the single frame tracker creates new tracks.