Mamba-FETrack: Frame-Event Tracking via State Space Model

Ju Huang; Shiao Wang; Shuai Wang; Zhe Wu; Xiao Wang; Bo Jiang

Mamba-FETrack: Frame-Event Tracking via State Space Model

Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang

TL;DR

This paper tackles RGB–Event visual tracking by replacing Transformer backbones with a State Space Model (SSM)–based Mamba framework to reduce compute and memory while maintaining strong accuracy. It introduces Mamba-FETrack, featuring modality-specific Mamba backbones for RGB and Event streams and a FusionMamba module that enhances cross-modal learning, followed by a tracking head trained with focal, L1, and GIoU losses. Experiments on FE108 and FELT show competitive or superior SR/PR scores compared with ViT-S–based OSTrack, with substantial reductions in FLOPs and parameters and a practical 24 FPS inference speed, illustrating the practicality of SSM-based fusion for multi-modal tracking. The work demonstrates the potential of the Mamba architecture for efficient multi-modal tracking and outlines future directions toward richer Event representations and more advanced Mamba designs.

Abstract

RGB-Event based tracking is an emerging research topic, focusing on how to effectively integrate heterogeneous multi-modal data (synchronized exposure video frames and asynchronous pulse Event stream). Existing works typically employ Transformer based networks to handle these modalities and achieve decent accuracy through input-level or feature-level fusion on multiple datasets. However, these trackers require significant memory consumption and computational complexity due to the use of self-attention mechanism. This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM) to achieve high-performance tracking while effectively reducing computational costs and realizing more efficient tracking. Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams. Then, we also propose to boost the interactive learning between the RGB and Event features using the Mamba network. The fused features will be fed into the tracking head for target object localization. Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker. Specifically, our Mamba-based tracker achieves 43.5/55.6 on the SR/PR metric, while the ViT-S based tracker (OSTrack) obtains 40.0/50.9. The GPU memory cost of ours and ViT-S based tracker is 13.98GB and 15.44GB, which decreased about $9.5\%$. The FLOPs and parameters of ours/ViT-S based OSTrack are 59GB/1076GB and 7MB/60MB, which decreased about $94.5\%$ and $88.3\%$, respectively. We hope this work can bring some new insights to the tracking field and greatly promote the application of the Mamba architecture in tracking. The source code of this work will be released on \url{https://github.com/Event-AHU/Mamba_FETrack}.

Mamba-FETrack: Frame-Event Tracking via State Space Model

TL;DR

Abstract

. The FLOPs and parameters of ours/ViT-S based OSTrack are 59GB/1076GB and 7MB/60MB, which decreased about

and

, respectively. We hope this work can bring some new insights to the tracking field and greatly promote the application of the Mamba architecture in tracking. The source code of this work will be released on \url{https://github.com/Event-AHU/Mamba_FETrack}.

Paper Structure (18 sections, 13 equations, 7 figures, 7 tables)

This paper contains 18 sections, 13 equations, 7 figures, 7 tables.

Introduction
Related Work
Frame-Event Tracking
State Space Model and Mamba
Preliminary: SSMs and Mamba
Our Proposed Approach
Overview
Network Architecture
Tracking Head and Loss Function
Experiment
Dataset and Evaluation Metric
Implementation Details
Comparison with Other SOTA Algorithms
Ablation Study
Comparison on Tracking Speed, Parameters, and FLOPs
...and 3 more sections

Figures (7)

Figure 1: Comparison of the parameters, FLOPs, and accuracy of our proposed Mamba-FETrack and other strong RGB-Event trackers.
Figure 2: An overview of our proposed Frame-Event tracking via state space model.
Figure 3: Ablation study on data inputs (FELT dataset).
Figure 3: Tracking results under each challenging factor on the FELT SOT dataset.
Figure 4: Tracking results of our Mamba-FETrack and other state-of-the-art trackers.
...and 2 more figures

Mamba-FETrack: Frame-Event Tracking via State Space Model

TL;DR

Abstract

Mamba-FETrack: Frame-Event Tracking via State Space Model

Authors

TL;DR

Abstract

Table of Contents

Figures (7)