Table of Contents
Fetching ...

PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking

Seungjae Kim, SeungJoon Lee, MyeongAh Cho

TL;DR

PlugTrack is the first framework to bridge classical and modern motion prediction paradigms through adaptive fusion in MOT through multi-perceptive motion understanding, employing multi-perceptive motion analysis to generate adaptive blending factors.

Abstract

Multi-object tracking (MOT) predominantly follows the tracking-by-detection paradigm, where Kalman filters serve as the standard motion predictor due to computational efficiency but inherently fail on non-linear motion patterns. Conversely, recent data-driven motion predictors capture complex non-linear dynamics but suffer from limited domain generalization and computational overhead. Through extensive analysis, we reveal that even in datasets dominated by non-linear motion, Kalman filter outperforms data-driven predictors in up to 34\% of cases, demonstrating that real-world tracking scenarios inherently involve both linear and non-linear patterns. To leverage this complementarity, we propose PlugTrack, a novel framework that adaptively fuses Kalman filter and data-driven motion predictors through multi-perceptive motion understanding. Our approach employs multi-perceptive motion analysis to generate adaptive blending factors. PlugTrack achieves significant performance gains on MOT17/MOT20 and state-of-the-art on DanceTrack without modifying existing motion predictors. To the best of our knowledge, PlugTrack is the first framework to bridge classical and modern motion prediction paradigms through adaptive fusion in MOT.

PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking

TL;DR

PlugTrack is the first framework to bridge classical and modern motion prediction paradigms through adaptive fusion in MOT through multi-perceptive motion understanding, employing multi-perceptive motion analysis to generate adaptive blending factors.

Abstract

Multi-object tracking (MOT) predominantly follows the tracking-by-detection paradigm, where Kalman filters serve as the standard motion predictor due to computational efficiency but inherently fail on non-linear motion patterns. Conversely, recent data-driven motion predictors capture complex non-linear dynamics but suffer from limited domain generalization and computational overhead. Through extensive analysis, we reveal that even in datasets dominated by non-linear motion, Kalman filter outperforms data-driven predictors in up to 34\% of cases, demonstrating that real-world tracking scenarios inherently involve both linear and non-linear patterns. To leverage this complementarity, we propose PlugTrack, a novel framework that adaptively fuses Kalman filter and data-driven motion predictors through multi-perceptive motion understanding. Our approach employs multi-perceptive motion analysis to generate adaptive blending factors. PlugTrack achieves significant performance gains on MOT17/MOT20 and state-of-the-art on DanceTrack without modifying existing motion predictors. To the best of our knowledge, PlugTrack is the first framework to bridge classical and modern motion prediction paradigms through adaptive fusion in MOT.

Paper Structure

This paper contains 26 sections, 22 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Qualitative comparison of motion prediction on DanceTrack (frames 508-511) dataset. Top row: PlugTrack results using TrackSSM and DiffMOT as base predictors. Bottom row: Standalone TrackSSM and DiffMOT results. PlugTrack adaptively fuses Kalman filter and data-driven predictions to better handle both linear and non-linear motions, achieving up to +10.6 IoU gains. This demonstrates its core novelty: a lightweight, plug-in mechanism that dynamically integrates complementary motion cues to outperform individual predictors in complex scenarios.
  • Figure 2: Comparison of motion predictor performance on DanceTrack and MOT17, showing the number of tracklets where each predictor (Kalman filter, DiffMOT, TrackSSM) achieves the highest IoU score with ground truth.
  • Figure 3: Overview of the PlugTrack architecture. Our framework consists of two main components: (1) Contextual Motion Encoder (CME) that analyzes motion from multi-perceptive through three specialized modules to generate multi-perceptive motion feature. (2) Then Adaptive Blending Generator (ABG) that produces adaptive blending factors for alpha blending. During training, Monte Carlo Alpha Search (MCAS) generates pseudo ground truth blending factors by evaluating multiple candidates with added Gaussian noise. During inference, the learned ABG directly predicts optimal blending factors for real-time adaptive fusion of Kalman filter and data-driven motion predictor outputs.
  • Figure 4: Qualitative comparison on DanceTrack dataset showing tracking results across frames 475-490. (a) Kalman filter maintains ID consistency. (b) DiffMOT suffers from ID switching at frame 485. (c) PlugTrack (Ours) successfully maintains ID consistency through adaptive blending($\alpha_x$=0.874, $\alpha_y$=0.413) Boxes of the same color indicate the same tracked identity.
  • Figure S1: Network architectures of (a) CME components (PDM, UQM, and Fusion Encoder) and (b) ABG.
  • ...and 1 more figures