How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines
Ailin Liu, Pepijn Vunderink, Jose Vargas Quiros, Chirag Raman, Hayley Hung
TL;DR
Problem: assessing verbal privacy in real-world settings when using low-frequency speech recordings that still enable analysis of social dynamics. Approach: the authors down-sample and analyze across datasets, measuring $FER$, $WER$, and $eSTOI$, and test privacy risks via bandwidth-extension attacks using neural BWE models trained on $16$ kHz (VCTK) and REWIND data. Key findings: practical privacy-preserving thresholds around $800$ Hz for VAD and $2000$ Hz for blocking intelligible content; bandwidth-extension can recover some information (primarily stop-words) but human intelligibility remains limited; privacy is not absolute against advanced attacks. Significance: findings guide design of privacy-conscious wearables and motivate robust defenses and attack-aware evaluation for real-world speech privacy.
Abstract
Low-frequency audio has been proposed as a promising privacy-preserving modality to study social dynamics in real-world settings. To this end, researchers have developed wearable devices that can record audio at frequencies as low as 1250 Hz to mitigate the automatic extraction of the verbal content of speech that may contain private details. This paper investigates the validity of this hypothesis, examining the degree to which low-frequency speech ensures verbal privacy. It includes simulating a potential privacy attack in various noise environments. Further, it explores the trade-off between the performance of voice activity detection, which is fundamental for understanding social behavior, and privacy-preservation. The evaluation incorporates subjective human intelligibility and automatic speech recognition performance, comprehensively analyzing the delicate balance between effective social behavior analysis and preserving verbal privacy.
