Table of Contents
Fetching ...

SpikeTrack: A Spike-driven Framework for Efficient Visual Tracking

Qiuyang Zhang, Jiujun Cheng, Qichao Mao, Cong Liu, Yu Fang, Yuhong Li, Mengying Ge, Shangce Gao

TL;DR

To the knowledge, SpikeTrack is the first spike-driven framework to make RGB tracking both accurate and energy efficient, and employs a novel asymmetric design that uses asymmetric timestep expansion and unidirectional information flow, harnessing spatiotemporal dynamics while cutting computation.

Abstract

Spiking Neural Networks (SNNs) promise energy-efficient vision, but applying them to RGB visual tracking remains difficult: Existing SNN tracking frameworks either do not fully align with spike-driven computation or do not fully leverage neurons' spatiotemporal dynamics, leading to a trade-off between efficiency and accuracy. To address this, we introduce SpikeTrack, a spike-driven framework for energy-efficient RGB object tracking. SpikeTrack employs a novel asymmetric design that uses asymmetric timestep expansion and unidirectional information flow, harnessing spatiotemporal dynamics while cutting computation. To ensure effective unidirectional information transfer between branches, we design a memory-retrieval module inspired by neural inference mechanisms. This module recurrently queries a compact memory initialized by the template to retrieve target cues and sharpen target perception over time. Extensive experiments demonstrate that SpikeTrack achieves the state-of-the-art among SNN-based trackers and remains competitive with advanced ANN trackers. Notably, it surpasses TransT on LaSOT dataset while consuming only 1/26 of its energy. To our knowledge, SpikeTrack is the first spike-driven framework to make RGB tracking both accurate and energy efficient. The code and models are available at https://github.com/faicaiwawa/SpikeTrack.

SpikeTrack: A Spike-driven Framework for Efficient Visual Tracking

TL;DR

To the knowledge, SpikeTrack is the first spike-driven framework to make RGB tracking both accurate and energy efficient, and employs a novel asymmetric design that uses asymmetric timestep expansion and unidirectional information flow, harnessing spatiotemporal dynamics while cutting computation.

Abstract

Spiking Neural Networks (SNNs) promise energy-efficient vision, but applying them to RGB visual tracking remains difficult: Existing SNN tracking frameworks either do not fully align with spike-driven computation or do not fully leverage neurons' spatiotemporal dynamics, leading to a trade-off between efficiency and accuracy. To address this, we introduce SpikeTrack, a spike-driven framework for energy-efficient RGB object tracking. SpikeTrack employs a novel asymmetric design that uses asymmetric timestep expansion and unidirectional information flow, harnessing spatiotemporal dynamics while cutting computation. To ensure effective unidirectional information transfer between branches, we design a memory-retrieval module inspired by neural inference mechanisms. This module recurrently queries a compact memory initialized by the template to retrieve target cues and sharpen target perception over time. Extensive experiments demonstrate that SpikeTrack achieves the state-of-the-art among SNN-based trackers and remains competitive with advanced ANN trackers. Notably, it surpasses TransT on LaSOT dataset while consuming only 1/26 of its energy. To our knowledge, SpikeTrack is the first spike-driven framework to make RGB tracking both accurate and energy efficient. The code and models are available at https://github.com/faicaiwawa/SpikeTrack.
Paper Structure (21 sections, 14 equations, 10 figures, 5 tables)

This paper contains 21 sections, 14 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Energy–accuracy trade-off on LaSOTlasot. SpikeTrack achieves lower energy consumption than efficient ANN trackers while matching the accuracy of precision-oriented methods.
  • Figure 2: Structure comparison between one-stream tracking SNN (top) and our asymmetric tracking SNN (bottom). L represents the number of blocks in the backbone.
  • Figure 3: Overview of SpikeTrack. The network consists of three components: a weight-sharing siamese backbone, a memory retrieval module for information transfer, and a prediction head. We use asymmetric timestep inputs and unidirectional information flow. During inference, template branch features are converted and cached as memory. The search branch queries this memory to extract target cues. The template branchs runs once, per initialization or update.
  • Figure 4: Implementation details of the Memory Retrieval Module. The purple legend (bottom left) illustrates the recurrent, looped connectivity structure in the brain. For simplicity of illustration, the temporal spiking across timesteps are omitted.
  • Figure 5: Influence of the number of retrieval loop in MRM.
  • ...and 5 more figures