How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

Ailin Liu; Pepijn Vunderink; Jose Vargas Quiros; Chirag Raman; Hayley Hung

How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

Ailin Liu, Pepijn Vunderink, Jose Vargas Quiros, Chirag Raman, Hayley Hung

TL;DR

Problem: assessing verbal privacy in real-world settings when using low-frequency speech recordings that still enable analysis of social dynamics. Approach: the authors down-sample and analyze across datasets, measuring $FER$, $WER$, and $eSTOI$, and test privacy risks via bandwidth-extension attacks using neural BWE models trained on $16$ kHz (VCTK) and REWIND data. Key findings: practical privacy-preserving thresholds around $800$ Hz for VAD and $2000$ Hz for blocking intelligible content; bandwidth-extension can recover some information (primarily stop-words) but human intelligibility remains limited; privacy is not absolute against advanced attacks. Significance: findings guide design of privacy-conscious wearables and motivate robust defenses and attack-aware evaluation for real-world speech privacy.

Abstract

Low-frequency audio has been proposed as a promising privacy-preserving modality to study social dynamics in real-world settings. To this end, researchers have developed wearable devices that can record audio at frequencies as low as 1250 Hz to mitigate the automatic extraction of the verbal content of speech that may contain private details. This paper investigates the validity of this hypothesis, examining the degree to which low-frequency speech ensures verbal privacy. It includes simulating a potential privacy attack in various noise environments. Further, it explores the trade-off between the performance of voice activity detection, which is fundamental for understanding social behavior, and privacy-preservation. The evaluation incorporates subjective human intelligibility and automatic speech recognition performance, comprehensively analyzing the delicate balance between effective social behavior analysis and preserving verbal privacy.

How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

TL;DR

, and

, and test privacy risks via bandwidth-extension attacks using neural BWE models trained on

kHz (VCTK) and REWIND data. Key findings: practical privacy-preserving thresholds around

Hz for VAD and

Hz for blocking intelligible content; bandwidth-extension can recover some information (primarily stop-words) but human intelligibility remains limited; privacy is not absolute against advanced attacks. Significance: findings guide design of privacy-conscious wearables and motivate robust defenses and attack-aware evaluation for real-world speech privacy.

Abstract

Paper Structure (12 sections, 6 figures, 2 tables)

This paper contains 12 sections, 6 figures, 2 tables.

Introduction
Related work
Analysis of Low-Frequency Audio
Datasets
An analysis of low-frequency speech audio
Voice activity detection
Speech intelligibility
Analysis of bandwidth-extended low-frequency speech
Simulating an attack via Bandwidth Extension
Machine intelligibility
Human intelligibility
Conclusion

Figures (6)

Figure 1: Overview of the study. From datasets with and without mingle setting (Section 3.1), we process the audio samples into low-frequency speech audio (Section 3.2) and bandwidth-extended low-frequency speech audio (Section 3.3).
Figure 2: Performances (means and standard deviations) of rVAD on different sample rates comparing to original ones
Figure 3: Performances of Whisper on different frequencies compared to the ground truth transcripts and of speech intelligibility prediction from eSTOI on different frequencies compared to the original speech signals respectively.
Figure 4: Performances of ASR with BWE and without BWE on Pop-glass and VCTK audio respectively with sample rates, 800, 1250, and 2000 Hz compared to the ground truth transcripts
Figure 5: Mean and standard deviation of Q1 and Q2
...and 1 more figures

How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

TL;DR

Abstract

How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

Authors

TL;DR

Abstract

Table of Contents

Figures (6)