Table of Contents
Fetching ...

MM-Tracker: Motion Mamba with Margin Loss for UAV-platform Multiple Object Tracking

Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

TL;DR

MM-Tracker tackles UAV-based multi-object tracking by addressing both local object motion and global camera motion, as well as motion blur. It introduces Motion Mamba, a lightweight module that fuses local cross-correlation with bi-directional global scanning via vertical and horizontal state-space models to generate a motion map from bi-temporal detections. It also proposes Motion Margin Loss to create motion-aware decision boundaries, improving detection of fast-moving, motion-blurred objects using ground-truth motion maps derived from optical flow. Together, Motion Mamba and Motion Margin Loss yield state-of-the-art MOTA and IDF1 on Visdrone and UAVDT with fast inference, demonstrating practical benefits for UAV-MOT applications. The work advances UAV tracking by integrating efficient global motion modeling with motion-aware detector training, offering a compelling approach for real-time, robust object tracking from moving platforms.

Abstract

Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces both local object motion and global camera motion. Motion blur also increases the difficulty of detecting large moving objects. Previous UAV motion modeling approaches either focus only on local motion or ignore motion blurring effects, thus limiting their tracking performance and speed. To address these issues, we propose the Motion Mamba Module, which explores both local and global motion features through cross-correlation and bi-directional Mamba Modules for better motion modeling. To address the detection difficulties caused by motion blur, we also design motion margin loss to effectively improve the detection accuracy of motion blurred objects. Based on the Motion Mamba module and motion margin loss, our proposed MM-Tracker surpasses the state-of-the-art in two widely open-source UAV-MOT datasets. Code will be available.

MM-Tracker: Motion Mamba with Margin Loss for UAV-platform Multiple Object Tracking

TL;DR

MM-Tracker tackles UAV-based multi-object tracking by addressing both local object motion and global camera motion, as well as motion blur. It introduces Motion Mamba, a lightweight module that fuses local cross-correlation with bi-directional global scanning via vertical and horizontal state-space models to generate a motion map from bi-temporal detections. It also proposes Motion Margin Loss to create motion-aware decision boundaries, improving detection of fast-moving, motion-blurred objects using ground-truth motion maps derived from optical flow. Together, Motion Mamba and Motion Margin Loss yield state-of-the-art MOTA and IDF1 on Visdrone and UAVDT with fast inference, demonstrating practical benefits for UAV-MOT applications. The work advances UAV tracking by integrating efficient global motion modeling with motion-aware detector training, offering a compelling approach for real-time, robust object tracking from moving platforms.

Abstract

Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces both local object motion and global camera motion. Motion blur also increases the difficulty of detecting large moving objects. Previous UAV motion modeling approaches either focus only on local motion or ignore motion blurring effects, thus limiting their tracking performance and speed. To address these issues, we propose the Motion Mamba Module, which explores both local and global motion features through cross-correlation and bi-directional Mamba Modules for better motion modeling. To address the detection difficulties caused by motion blur, we also design motion margin loss to effectively improve the detection accuracy of motion blurred objects. Based on the Motion Mamba module and motion margin loss, our proposed MM-Tracker surpasses the state-of-the-art in two widely open-source UAV-MOT datasets. Code will be available.
Paper Structure (15 sections, 5 equations, 9 figures, 5 tables)

This paper contains 15 sections, 5 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Advantages of our MM-Tracker. Left: a global camera motion was experienced during frame $t$ and $t+1$, our global scan matched the object that the local correlation missed. Right: with our motion margin loss, the detection score of motion-blurred object is increased.
  • Figure 2: Overall architecture of MM-Tracker. Multi-scale detection features are first extracted using a detection backbone (DetBackbone), which is fed into the detection head (DetHead) to output the object bounding box, score, and category. The object score is optimized using the proposed MMLoss. The detection feature is also fed into the proposed Motion Mamba module (MM), which captures the difference between the two detection features and predicts the motion map. Afterward, the position of the object in the previous frame in the next frame is predicted based on the motion map, and the predicted position of the object is matched with the detected position in the current frame to generate a new object trajectory.
  • Figure 3: Structures of our Motion Mamba block. The vertical state space model (V-SSM) and the horizontal state space model (H-SSM) are used to scan the feature map in two directions. The scanned results are added to realize global feature interaction. The short-cut connection is adopted to accelerate training.
  • Figure 4: Ground-truth motion map generation procedures.
  • Figure 5: Motion margin function curves of MMLoss.
  • ...and 4 more figures