MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

Changcheng Xiao; Qiong Cao; Zhigang Luo; Long Lan

MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

Changcheng Xiao, Qiong Cao, Zhigang Luo, Long Lan

TL;DR

MambaTrack addresses the challenge of multi-object tracking under complex, nonlinear motion by introducing a data-driven Mamba-based motion predictor (MTP) and a tracklet patching module (TPM) to maintain trajectories under occlusion. The method operates online within a tracking-by-detection framework, using a bi-Mamba encoding layer to learn motion patterns from historical bounding-box trajectories and an autoregressive TPM to re-establish lost tracks. Across DanceTrack and SportsMOT, MambaTrack delivers state-of-the-art performance, highlighting the efficacy of learned motion models over traditional Kalman-filter approaches, especially in visually similar, fast-moving scenarios. The work proposes a practical baseline for motion-based MOT with near-linear inference complexity and demonstrates substantial improvements in trajectory consistency and association accuracy, making it valuable for real-time applications.

Abstract

Tracking by detection has been the prevailing paradigm in the field of Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter to estimate the future locations of objects, assuming linear object motion. However, they fall short when tracking objects exhibiting nonlinear and diverse motion in scenarios like dancing and sports. In addition, there has been limited focus on utilizing learning-based motion predictors in MOT. To address these challenges, we resort to exploring data-driven motion prediction methods. Inspired by the great expectation of state space models (SSMs), such as Mamba, in long-term sequence modeling with near-linear complexity, we introduce a Mamba-based motion model named Mamba moTion Predictor (MTP). MTP is designed to model the complex motion patterns of objects like dancers and athletes. Specifically, MTP takes the spatial-temporal location dynamics of objects as input, captures the motion pattern using a bi-Mamba encoding layer, and predicts the next motion. In real-world scenarios, objects may be missed due to occlusion or motion blur, leading to premature termination of their trajectories. To tackle this challenge, we further expand the application of MTP. We employ it in an autoregressive way to compensate for missing observations by utilizing its own predictions as inputs, thereby contributing to more consistent trajectories. Our proposed tracker, MambaTrack, demonstrates advanced performance on benchmarks such as Dancetrack and SportsMOT, which are characterized by complex motion and severe occlusion.

MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

TL;DR

Abstract

Paper Structure (18 sections, 9 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 4 figures, 7 tables, 1 algorithm.

Introduction
Related work
Tracking-by-detection methods
Motion models
State Space Models
Preliminaries
The proposed method
Notation
Overview
Mamba Motion Predictor
Tracklet patching module
Inference
EXPERIMENTS
Datasets and Metrics
Implementation Details
...and 3 more sections

Figures (4)

Figure 1: Overall architecture of the proposed methods. First, we employ the proposed Mamba Motion Predictor (MTP) to predict the bounding boxes $\hat{\mathcal{B}}_t$ of active tracklets in the subsequent frame. These predictions are then matched with the detection results $\mathcal{B}_t$ of the current frame $t$ based on Intersection-over-Union (IoU) similarity. Subsequently, the Tracklet Patching Module (TPM) predicts the bounding box $\hat{\mathbf{P}}$ of lost tracklets through autoregression and pairs it with the remaining detections $\mathbf{B}_u$. Finally, the results of the matching steps are combined to derive the tracking results $\mathbb{T}$. Different colored bounding boxes represent objects of different identities.
Figure 2: Overview of the proposed Mamba motion predictor.
Figure 3: In TPM, we utilize MTP in an autoregressive manner to extend the lost tracklets, providing an opportunity for their trajectories to be re-established in future frames.
Figure 4: Qualitative results on DanceTrack.

MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

TL;DR

Abstract

MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

Authors

TL;DR

Abstract

Table of Contents

Figures (4)