Privacy in Speech Technology
Tom Bäckström
TL;DR
Privacy in speech technology addresses the inherent privacy risks of speech signals, detailing a comprehensive threat model, attack surfaces, and attacker architectures. It surveys a spectrum of protections—from information isolation and secure processing to privacy-preserving architectures and acoustic interventions—along with methods to evaluate privacy and utility objectively and subjectively. The work also integrates perception and psychology, user-interface design, and a legal-societal lens, arguing for privacy-by-design and provable protections to build trustworthy speech systems. It concludes with urgent research directions, including consent mechanisms for streaming, streaming-appropriate privacy metrics, multi-user privacy, and robust disentanglement strategies to minimize leakage while preserving utility.
Abstract
Speech technology for communication, accessing information, and services has rapidly improved in quality. It is convenient and appealing because speech is the primary mode of communication for humans. Such technology, however, also presents proven threats to privacy. Speech is a tool for communication and it will thus inherently contain private information. Importantly, it however also contains a wealth of side information, such as information related to health, emotions, affiliations, and relationships, all of which are private. Exposing such private information can lead to serious threats such as price gouging, harassment, extortion, and stalking. This paper is a tutorial on privacy issues related to speech technology, modeling their threats, approaches for protecting users' privacy, measuring the performance of privacy-protecting methods, perception of privacy as well as societal and legal consequences. In addition to a tutorial overview, it also presents lines for further development where improvements are most urgently needed.
