Table of Contents
Fetching ...

MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking

Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

TL;DR

MambaTrack tackles night UAV tracking by introducing dual enhancements: a lightweight Mamba-based low-light enhancer (MLLE) and a Cross-modal Mamba Network (CMM) that fuses vision and language cues. The method delivers robust tracking under poor illumination with high efficiency, thanks to linear-complexity Mamba backbones and a lightweight language-augmented search. Key contributions include the MLLE module, the CMM network, and a new vision-language night UAV tracking task via annotated language prompts, validated on five challenging datasets with state-of-the-art performance and substantial memory and speed gains. The practical impact is improved nighttime UAV tracking capability with lower resource demands, enabling more reliable real-time operation in dark environments.

Abstract

Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications. To this end, we propose an efficient mamba-based tracker, leveraging dual enhancement techniques to boost night UAV tracking. The mamba-based low-light enhancer, equipped with an illumination estimator and a damage restorer, achieves global image enhancement while preserving the details and structure of low-light images. Additionally, we advance a cross-modal mamba network to achieve efficient interactive learning between vision and language modalities. Extensive experiments showcase that our method achieves advanced performance and exhibits significantly improved computation and memory efficiency. For instance, our method is 2.8$\times$ faster than CiteTracker and reduces 50.2$\%$ GPU memory. Our codes are available at \url{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}.

MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking

TL;DR

MambaTrack tackles night UAV tracking by introducing dual enhancements: a lightweight Mamba-based low-light enhancer (MLLE) and a Cross-modal Mamba Network (CMM) that fuses vision and language cues. The method delivers robust tracking under poor illumination with high efficiency, thanks to linear-complexity Mamba backbones and a lightweight language-augmented search. Key contributions include the MLLE module, the CMM network, and a new vision-language night UAV tracking task via annotated language prompts, validated on five challenging datasets with state-of-the-art performance and substantial memory and speed gains. The practical impact is improved nighttime UAV tracking capability with lower resource demands, enabling more reliable real-time operation in dark environments.

Abstract

Night unmanned aerial vehicle (UAV) tracking is impeded by the challenges of poor illumination, with previous daylight-optimized methods demonstrating suboptimal performance in low-light conditions, limiting the utility of UAV applications. To this end, we propose an efficient mamba-based tracker, leveraging dual enhancement techniques to boost night UAV tracking. The mamba-based low-light enhancer, equipped with an illumination estimator and a damage restorer, achieves global image enhancement while preserving the details and structure of low-light images. Additionally, we advance a cross-modal mamba network to achieve efficient interactive learning between vision and language modalities. Extensive experiments showcase that our method achieves advanced performance and exhibits significantly improved computation and memory efficiency. For instance, our method is 2.8 faster than CiteTracker and reduces 50.2 GPU memory. Our codes are available at \url{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}.

Paper Structure

This paper contains 12 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Performance and efficiency comparisons between the proposed MambaTrack and two SOTA trackers ( i.e., JointNLT zhou2023joint and CiteTracker li2023citetracker) on UAVDark135 li2022all.
  • Figure 2: Overview of MambaTrack. It comprises visual and language branches (left), a cross-modal mamba network (middle), and a tracking head (right). The visual branch mainly contains a mamba-based low-light enhancer and a visual mamba encoder for image enhancement and encoding, respectively. The language branch includes a tokenizer and a language mamba encoder. Then, we adopt a cross-modal mamba network for multimodal enhancement learning. Finally, the language-enhanced search embeddings are fed into the tracking head to predict the target. For simplicity, linear projections are omitted here.
  • Figure 3: Comparison with SOTA trackers on UAVDark70, NAT2021, NAT2021L, and DarkTrack2021 using AUC scores.
  • Figure 4: Comparison with SOTA trackers on UAVDark135 using mACC scores. Best viewed in color.
  • Figure 5: Visualization of the proposed two components ( i.e., MLLE and CMM). The images are enhanced for visualization except for the initial frame. Best viewed by zooming in.