Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege
Peng Huang, Yao Wei, Peng Cheng, Zhongjie Ba, Li Lu, Feng Lin, Yang Wang, Kui Ren
TL;DR
This paper tackles the privacy risk of voice eavesdropping by smart devices and proposes InfoMasker, a phoneme-based informational masking system that jams microphones while permitting authorized content recovery. The core idea is to construct a noise signal from phoneme sequences that mimics target speech in phonetic structure and timing, creating strong informational masking that resists denoising and human comprehension. The system comprises a registration-driven noise generator, real-time ultrasonic jamming using a transmitter array with pre-compensation, and a transformer-based denoising module for authorized recovery. Experimental results across multiple languages, devices, and real-world office scenarios show substantial degradation of ASR recognition (often below 50% WER) while enabling recoverability for authorized users, indicating practical privacy-preserving potential in controlled environments.
Abstract
The widespread smart devices raise people's concerns of being eavesdropped on. To enhance voice privacy, recent studies exploit the nonlinearity in microphone to jam audio recorders with inaudible ultrasound. However, existing solutions solely rely on energetic masking. Their simple-form noise leads to several problems, such as high energy requirements and being easily removed by speech enhancement techniques. Besides, most of these solutions do not support authorized recording, which restricts their usage scenarios. In this paper, we design an efficient yet robust system that can jam microphones while preserving authorized recording. Specifically, we propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans and is resistant to denoising techniques. Besides, we optimize the noise transmission strategy for broader coverage and implement a hardware prototype of our system. Experimental results show that our system can reduce the recognition accuracy of recordings to below 50\% under all tested speech recognition systems, which is much better than existing solutions.
