HyMAD: A Hybrid Multi-Activity Detection Approach for Border Surveillance and Monitoring
Sriram Srinivasan, Srinivasan Aruchamy, Siva Ram Krisha Vadali
TL;DR
HyMAD tackles the challenge of multi-label seismic event detection under overlapping human, animal, and vehicle activities for border surveillance. It introduces a hybrid architecture that fuses learnable frequency features from SincNet with RNN-based temporal encoding, using self-attention per modality and cross-attention fusion to disentangle concurrent events. The approach demonstrates competitive performance on real-field seismic data and shows strong generalization to complex overlaps while offering a modular framework for extension. This work advances seismic signal analysis toward robust, real-time monitoring in security applications.
Abstract
Seismic sensing has emerged as a promising solution for border surveillance and monitoring; the seismic sensors that are often buried underground are small and cannot be noticed easily, making them difficult for intruders to detect, avoid, or vandalize. This significantly enhances their effectiveness compared to highly visible cameras or fences. However, accurately detecting and distinguishing between overlapping activities that are happening simultaneously, such as human intrusions, animal movements, and vehicle rumbling, remains a major challenge due to the complex and noisy nature of seismic signals. Correctly identifying simultaneous activities is critical because failing to separate them can lead to misclassification, missed detections, and an incomplete understanding of the situation, thereby reducing the reliability of surveillance systems. To tackle this problem, we propose HyMAD (Hybrid Multi-Activity Detection), a deep neural architecture based on spatio-temporal feature fusion. The framework integrates spectral features extracted with SincNet and temporal dependencies modeled by a recurrent neural network (RNN). In addition, HyMAD employs self-attention layers to strengthen intra-modal representations and a cross-modal fusion module to achieve robust multi-label classification of seismic events. e evaluate our approach on a dataset constructed from real-world field recordings collected in the context of border surveillance and monitoring, demonstrating its ability to generalize to complex, simultaneous activity scenarios involving humans, animals, and vehicles. Our method achieves competitive performance and offers a modular framework for extending seismic-based activity recognition in real-world security applications.
