Table of Contents
Fetching ...

TRIDENT: Tri-modal Real-time Intrusion Detection Engine for New Targets

Ildi Alla, Selma Yahia, Valeria Loscri

TL;DR

TRIDENT tackles real-time UAV detection under real-world conditions by fusing synchronized audio, visual, and RF data. It introduces two fusion strategies (Late Fusion and GMU) and a synchronized data augmentation pipeline to boost robustness when all modalities degrade simultaneously. The framework demonstrates strong real-world accuracy (up to 96.89% in clean data) and robustness in noisy conditions (83.26% with synchronized test-time noise), while maintaining low latency (about 6 ms per detection) and low energy consumption on edge hardware. A diverse urban/non-urban dataset is released to support reproducibility and future research in robust counter-UAV sensing.

Abstract

The increasing availability of drones and their potential for malicious activities pose significant privacy and security risks, necessitating fast and reliable detection in real-world environments. However, existing drone detection systems often struggle in real-world settings due to environmental noise and sensor limitations. This paper introduces TRIDENT, a tri-modal drone detection framework that integrates synchronized audio, visual, and RF data to enhance robustness and reduce dependence on individual sensors. TRIDENT introduces two fusion strategies - Late Fusion and GMU Fusion - to improve multi-modal integration while maintaining efficiency. The framework incorporates domain-specific feature extraction techniques alongside a specialized data augmentation pipeline that simulates real-world sensor degradation to improve generalization capabilities. A diverse multi-sensor dataset is collected in urban and non-urban environments under varying lighting conditions, ensuring comprehensive evaluation. Experimental results show that TRIDENT achieves 98.8 percent accuracy in real-world recordings and 83.26 percent in a more complex setting (augmented data), outperforming unimodal and dual-modal baselines. Moreover, TRIDENT operates in real-time, detecting drones in just 6.09 ms while consuming only 75.27 mJ per detection, making it highly efficient for resource-constrained devices. The dataset and code have been released to ensure reproducibility (https://github.com/TRIDENT-2025/TRIDENT).

TRIDENT: Tri-modal Real-time Intrusion Detection Engine for New Targets

TL;DR

TRIDENT tackles real-time UAV detection under real-world conditions by fusing synchronized audio, visual, and RF data. It introduces two fusion strategies (Late Fusion and GMU) and a synchronized data augmentation pipeline to boost robustness when all modalities degrade simultaneously. The framework demonstrates strong real-world accuracy (up to 96.89% in clean data) and robustness in noisy conditions (83.26% with synchronized test-time noise), while maintaining low latency (about 6 ms per detection) and low energy consumption on edge hardware. A diverse urban/non-urban dataset is released to support reproducibility and future research in robust counter-UAV sensing.

Abstract

The increasing availability of drones and their potential for malicious activities pose significant privacy and security risks, necessitating fast and reliable detection in real-world environments. However, existing drone detection systems often struggle in real-world settings due to environmental noise and sensor limitations. This paper introduces TRIDENT, a tri-modal drone detection framework that integrates synchronized audio, visual, and RF data to enhance robustness and reduce dependence on individual sensors. TRIDENT introduces two fusion strategies - Late Fusion and GMU Fusion - to improve multi-modal integration while maintaining efficiency. The framework incorporates domain-specific feature extraction techniques alongside a specialized data augmentation pipeline that simulates real-world sensor degradation to improve generalization capabilities. A diverse multi-sensor dataset is collected in urban and non-urban environments under varying lighting conditions, ensuring comprehensive evaluation. Experimental results show that TRIDENT achieves 98.8 percent accuracy in real-world recordings and 83.26 percent in a more complex setting (augmented data), outperforming unimodal and dual-modal baselines. Moreover, TRIDENT operates in real-time, detecting drones in just 6.09 ms while consuming only 75.27 mJ per detection, making it highly efficient for resource-constrained devices. The dataset and code have been released to ensure reproducibility (https://github.com/TRIDENT-2025/TRIDENT).

Paper Structure

This paper contains 33 sections, 10 equations, 16 figures, 17 tables, 2 algorithms.

Figures (16)

  • Figure 1: Measurement locations during data collection.
  • Figure 2: Experimental setup for synchronizing sensor data.
  • Figure 3: Data acquisition method from drones.
  • Figure 4: Audio feature extraction process.
  • Figure 5: T-F representation of drone sound without/with background noise and volume scaling effects.
  • ...and 11 more figures