Table of Contents
Fetching ...

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

Zhenyuan Xiao, Huanran Hu, Guili Xu, Junwei He

TL;DR

TAME addresses the challenge of detecting and localizing compact UAVs using audio alone by introducing a Temporal Spectral Mamba backbone and a Temporal Feature Enhancement module to jointly learn temporal and spectral audio cues. The approach leverages a selective state-space model to efficiently capture propagation dynamics, with a dual-task detection head for 3D trajectory estimation and UAV classification. On the MMAUD dataset, TAME achieves state-of-the-art performance in both trajectory accuracy and classification, particularly under adverse lighting, while remaining suitable for mobile or wearable deployments due to its audio-only design. The work provides open-source code and highlights the potential of audio-driven anti-UAV systems, while acknowledging limitations in position accuracy and data demands, and outlining future directions for unsupervised and multi-modal enhancements.

Abstract

The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification. This innovative anti-UAV detection model leverages a parallel selective state-space model to simultaneously capture and learn both the temporal and spectral features of audio, effectively analyzing propagation of sound. To further enhance temporal features, we introduce a Temporal Feature Enhancement Module, which integrates spectral features into temporal data using residual cross-attention. This enhanced temporal information is then employed for precise 3D trajectory estimation and classification. Our model sets a new standard of performance on the MMUAD benchmarks, demonstrating superior accuracy and effectiveness. The code and trained models are publicly available on GitHub https://github.com/AmazingDay1/TAME.

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification

TL;DR

TAME addresses the challenge of detecting and localizing compact UAVs using audio alone by introducing a Temporal Spectral Mamba backbone and a Temporal Feature Enhancement module to jointly learn temporal and spectral audio cues. The approach leverages a selective state-space model to efficiently capture propagation dynamics, with a dual-task detection head for 3D trajectory estimation and UAV classification. On the MMAUD dataset, TAME achieves state-of-the-art performance in both trajectory accuracy and classification, particularly under adverse lighting, while remaining suitable for mobile or wearable deployments due to its audio-only design. The work provides open-source code and highlights the potential of audio-driven anti-UAV systems, while acknowledging limitations in position accuracy and data demands, and outlining future directions for unsupervised and multi-modal enhancements.

Abstract

The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification. This innovative anti-UAV detection model leverages a parallel selective state-space model to simultaneously capture and learn both the temporal and spectral features of audio, effectively analyzing propagation of sound. To further enhance temporal features, we introduce a Temporal Feature Enhancement Module, which integrates spectral features into temporal data using residual cross-attention. This enhanced temporal information is then employed for precise 3D trajectory estimation and classification. Our model sets a new standard of performance on the MMUAD benchmarks, demonstrating superior accuracy and effectiveness. The code and trained models are publicly available on GitHub https://github.com/AmazingDay1/TAME.

Paper Structure

This paper contains 12 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Motivation of our proposed solution.
  • Figure 2: Proposed TAME Architecture for audio-only UAV detection.
  • Figure 3: Temporal Feature Enhancement Module
  • Figure 4: Test set trajectory estimation: Red curves represent ground truth, blue curves show predicted trajectories.
  • Figure 5: The confusion matrix for the classification results.