WiFi CSI Based Temporal Activity Detection via Dual Pyramid Network
Zhendong Liu, Le Zhang, Bing Li, Yingjie Zhou, Zhenghua Chen, Ce Zhu
TL;DR
This work tackles WiFi CSI-based temporal activity detection in untrimmed, multi-activity streams by introducing DPWiT, a dual-pyramid framework that learns frequency-aware representations through a Temporal Signal Semantic Encoder (TSSE) and a Local Sensitive Response Encoder (LSRE), fused via Cross-attention Pyramid Fusion. A novel Signed Mask-Attention mechanism within the TSSE emphasizes informative regions while suppressing noise, and the LSRE captures regional fluctuations in a learning-free manner, enabling robust localization without heavy supervision. The authors provide a real-world WiFi CSI dataset with 2,114 annotated activity instances across 553 untrimmed samples and demonstrate state-of-the-art performance against strong baselines, supported by comprehensive ablations and qualitative analyses. The approach advances wireless Temporal Activity Detection by effectively combining high- and low-frequency cues, achieving accurate timing and semantic understanding suitable for privacy-preserving real-world monitoring.
Abstract
We address the challenge of WiFi-based temporal activity detection and propose an efficient Dual Pyramid Network that integrates Temporal Signal Semantic Encoders and Local Sensitive Response Encoders. The Temporal Signal Semantic Encoder splits feature learning into high and low-frequency components, using a novel Signed Mask-Attention mechanism to emphasize important areas and downplay unimportant ones, with the features fused using ContraNorm. The Local Sensitive Response Encoder captures fluctuations without learning. These feature pyramids are then combined using a new cross-attention fusion mechanism. We also introduce a dataset with over 2,114 activity segments across 553 WiFi CSI samples, each lasting around 85 seconds. Extensive experiments show our method outperforms challenging baselines.
