We Can Hear You with mmWave Radar! An End-to-End Eavesdropping System
Dachao Han, Teng Huang, Han Ding, Cui Zhao, Fei Wang, Ge Wang, Wei Xi
TL;DR
mmSpeech addresses speech privacy risks by enabling end-to-end reconstruction of intelligible speech from mmWave-induced vibrations of a surface, even through walls and without prior knowledge of the speaker. It identifies PET film as an optimal vibrating medium and optimizes radar sampling to capture sub-4 kHz speech content, then employs a GAN-based network with spectrum denoising and multi-resolution Mel losses to refine the signal. The system achieves state-of-the-art quality (e.g., FWSegSNR ≈ 9.43 dB, MCD ≈ 5.18, MEL ≈ 2.09 on seen data) and generalizes to unseen speakers and conditions, aided by synthetic data generation and selective ASR encoder fine-tuning that significantly improves transcription accuracy. The work highlights a practical privacy threat and offers defense directions (damping, noise injection, strategic placement) while providing a comprehensive evaluation framework for mmWave-based vibration eavesdropping in through-wall scenarios.
Abstract
With the rise of voice-enabled technologies, loudspeaker playback has become widespread, posing increasing risks to speech privacy. Traditional eavesdropping methods often require invasive access or line-of-sight, limiting their practicality. In this paper, we present mmSpeech, an end-to-end mmWave-based eavesdropping system that reconstructs intelligible speech solely from vibration signals induced by loudspeaker playback, even through walls and without prior knowledge of the speaker. To achieve this, we reveal an optimal combination of vibrating material and radar sampling rate for capturing high-quality vibrations using narrowband mmWave signals. We then design a deep neural network that reconstructs intelligible speech from the estimated noisy spectrograms. To further support downstream speech understanding, we introduce a synthetic training pipeline and selectively fine-tune the encoder of a pre-trained ASR model. We implement mmSpeech with a commercial mmWave radar and validate its performance through extensive experiments. Results show that mmSpeech achieves state-of-the-art speech quality and generalizes well across unseen speakers and various conditions.
