Table of Contents
Fetching ...

TrackSSM: A General Motion Predictor by State-Space Model

Bin Hu, Run Luo, Zelin Liu, Cheng Wang, Wenyu Liu

TL;DR

TrackSSM introduces a data-driven, encoder-decoder motion model for multi-object tracking that leverages a Mamba-based trajectory encoder to extract flow information and a cascaded Flow-SSM-based flow decoder to predict temporal bounding-box positions. A Step-by-Step Linear ($S^2L$) training strategy decomposes nonlinear trajectory updates into a sequence of simpler regression steps, improving recall of lost trajectories and handling diverse motion patterns. Empirical results on MOT17, DanceTrack, and SportsMOT show that TrackSSM achieves competitive or superior performance with significantly lower computational overhead, enabling real-time inference with lightweight detectors. Overall, the approach demonstrates that data-driven state-space modeling can provide robust, scalable motion prediction across varied MOT scenarios, offering a potential universal motion predictor for tracking systems.

Abstract

Temporal motion modeling has always been a key component in multiple object tracking (MOT) which can ensure smooth trajectory movement and provide accurate positional information to enhance association precision. However, current motion models struggle to be both efficient and effective across different application scenarios. To this end, we propose TrackSSM inspired by the recently popular state space models (SSM), a unified encoder-decoder motion framework that uses data-dependent state space model to perform temporal motion of trajectories. Specifically, we propose Flow-SSM, a module that utilizes the position and motion information from historical trajectories to guide the temporal state transition of object bounding boxes. Based on Flow-SSM, we design a flow decoder. It is composed of a cascaded motion decoding module employing Flow-SSM, which can use the encoded flow information to complete the temporal position prediction of trajectories. Additionally, we propose a Step-by-Step Linear (S$^2$L) training strategy. By performing linear interpolation between the positions of the object in the previous frame and the current frame, we construct the pseudo labels of step-by-step linear training, ensuring that the trajectory flow information can better guide the object bounding box in completing temporal transitions. TrackSSM utilizes a simple Mamba-Block to build a motion encoder for historical trajectories, forming a temporal motion model with an encoder-decoder structure in conjunction with the flow decoder. TrackSSM is applicable to various tracking scenarios and achieves excellent tracking performance across multiple benchmarks, further extending the potential of SSM-like temporal motion models in multi-object tracking tasks. Code and models are publicly available at \url{https://github.com/Xavier-Lin/TrackSSM}.

TrackSSM: A General Motion Predictor by State-Space Model

TL;DR

TrackSSM introduces a data-driven, encoder-decoder motion model for multi-object tracking that leverages a Mamba-based trajectory encoder to extract flow information and a cascaded Flow-SSM-based flow decoder to predict temporal bounding-box positions. A Step-by-Step Linear () training strategy decomposes nonlinear trajectory updates into a sequence of simpler regression steps, improving recall of lost trajectories and handling diverse motion patterns. Empirical results on MOT17, DanceTrack, and SportsMOT show that TrackSSM achieves competitive or superior performance with significantly lower computational overhead, enabling real-time inference with lightweight detectors. Overall, the approach demonstrates that data-driven state-space modeling can provide robust, scalable motion prediction across varied MOT scenarios, offering a potential universal motion predictor for tracking systems.

Abstract

Temporal motion modeling has always been a key component in multiple object tracking (MOT) which can ensure smooth trajectory movement and provide accurate positional information to enhance association precision. However, current motion models struggle to be both efficient and effective across different application scenarios. To this end, we propose TrackSSM inspired by the recently popular state space models (SSM), a unified encoder-decoder motion framework that uses data-dependent state space model to perform temporal motion of trajectories. Specifically, we propose Flow-SSM, a module that utilizes the position and motion information from historical trajectories to guide the temporal state transition of object bounding boxes. Based on Flow-SSM, we design a flow decoder. It is composed of a cascaded motion decoding module employing Flow-SSM, which can use the encoded flow information to complete the temporal position prediction of trajectories. Additionally, we propose a Step-by-Step Linear (SL) training strategy. By performing linear interpolation between the positions of the object in the previous frame and the current frame, we construct the pseudo labels of step-by-step linear training, ensuring that the trajectory flow information can better guide the object bounding box in completing temporal transitions. TrackSSM utilizes a simple Mamba-Block to build a motion encoder for historical trajectories, forming a temporal motion model with an encoder-decoder structure in conjunction with the flow decoder. TrackSSM is applicable to various tracking scenarios and achieves excellent tracking performance across multiple benchmarks, further extending the potential of SSM-like temporal motion models in multi-object tracking tasks. Code and models are publicly available at \url{https://github.com/Xavier-Lin/TrackSSM}.
Paper Structure (24 sections, 5 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 24 sections, 5 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: The overall tracking framework with the TrackSSM motion model, where TrackSSM consists of the Mamba encoder mamba and Flow decoder, is capable of performing temporal predictions on trajectories. The legend information is located in the box on the right side.
  • Figure 2: The overall structure of the flow decoder.
  • Figure 3: The structure of the each flow decoder layer.