Table of Contents
Fetching ...

Fully Spiking Neural Networks with Target Awareness for Energy-Efficient UAV Tracking

Pengzhi Zhong, Jiwei Mo, Dan Zeng, Feixiang He, Shuiwang Li

Abstract

Spiking Neural Networks (SNNs), characterized by their event-driven computation and low power consumption, have shown great potential for energy-efficient visual tracking on unmanned aerial vehicles (UAVs). However, existing efficient SNN-based trackers heavily rely on costly event cameras, limiting their deployment on UAVs. To address this limitation, we propose STATrack, an efficient fully spiking neural network framework for UAV visual tracking using RGB inputs only. To the best of our knowledge, this work is the first to investigate spiking neural networks for UAV visual tracking tasks. To mitigate the weakening of target features by background tokens, we propose adaptively maximizing the mutual information between templates and features. Extensive experiments on four widely used UAV tracking benchmarks demonstrate that STATrack achieves competitive tracking performance while maintaining low energy consumption.

Fully Spiking Neural Networks with Target Awareness for Energy-Efficient UAV Tracking

Abstract

Spiking Neural Networks (SNNs), characterized by their event-driven computation and low power consumption, have shown great potential for energy-efficient visual tracking on unmanned aerial vehicles (UAVs). However, existing efficient SNN-based trackers heavily rely on costly event cameras, limiting their deployment on UAVs. To address this limitation, we propose STATrack, an efficient fully spiking neural network framework for UAV visual tracking using RGB inputs only. To the best of our knowledge, this work is the first to investigate spiking neural networks for UAV visual tracking tasks. To mitigate the weakening of target features by background tokens, we propose adaptively maximizing the mutual information between templates and features. Extensive experiments on four widely used UAV tracking benchmarks demonstrate that STATrack achieves competitive tracking performance while maintaining low energy consumption.

Paper Structure

This paper contains 15 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison of accuracy and energy consumption on the UAV123 dataset. STATrack achieves 66.9% AUC with only 5.7 mJ inference energy, demonstrating a superior accuracy--energy trade-off.
  • Figure 2: Overview of the proposed framework. The architecture consists of an efficient SNN-based Transformer backbone for joint template–search feature learning, followed by a spiking head. (Bottom-left) Illustration of the proposed Adaptive Mutual Information Maximization (AMIM) module. (Bottom-right) Detailed structure of the adaptive dynamic weighting strategy used in AMIM. Note that $\{Z\}$ denotes a batch of input samples, while $\{Z'\}$ represents a randomly shuffled batch of $Z$.
  • Figure 3: Qualitative comparison of STATrack with six state-of-the-art UAV trackers on four video sequences from DTB70, UAV123, VisDrone2018, and UAVDT, namely Animal3, Car7, S1607, and uav0000088_00000_s.
  • Figure 4: For each example, we visualize target regions from different frames of the same video sequence (top row), followed by the corresponding feature maps generated by STATrack without AMIM (middle row) and with the proposed AMIM (bottom row).