Table of Contents
Fetching ...

Privacy-Preserving End-to-End Spoken Language Understanding

Yinggui Wang, Wei Huang, Le Yang

TL;DR

This work tackles privacy leakage in end-to-end SLU for IoT devices by introducing a multi-task framework that uses hidden-layer separation to isolate SLU information from ASR and identity recognition signals. It couples this architectural separation with joint adversarial training, resulting in variants such as SH-PPSLU and H-PPSLU (and their adversarially refined forms) that maintain SLU performance while suppressing attacker success in inferring sensitive attributes. Evaluations across LibriSpeech, VoxCeleb1, FSC, SLURP, and TED-LiUM show attacker accuracy approaching random guessing without substantial degradation to SLU accuracy, outperforming existing privacy-preserving baselines. The proposed approach offers a practical, deployable strategy for privacy-preserving spoken language understanding in resource-constrained IoT environments.

Abstract

Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.

Privacy-Preserving End-to-End Spoken Language Understanding

TL;DR

This work tackles privacy leakage in end-to-end SLU for IoT devices by introducing a multi-task framework that uses hidden-layer separation to isolate SLU information from ASR and identity recognition signals. It couples this architectural separation with joint adversarial training, resulting in variants such as SH-PPSLU and H-PPSLU (and their adversarially refined forms) that maintain SLU performance while suppressing attacker success in inferring sensitive attributes. Evaluations across LibriSpeech, VoxCeleb1, FSC, SLURP, and TED-LiUM show attacker accuracy approaching random guessing without substantial degradation to SLU accuracy, outperforming existing privacy-preserving baselines. The proposed approach offers a practical, deployable strategy for privacy-preserving spoken language understanding in resource-constrained IoT environments.

Abstract

Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.
Paper Structure (19 sections, 5 equations, 5 figures, 3 tables)

This paper contains 19 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a) Speech Control System, (b) End-to-End SLU, (c) Potential Malicious Inference Attack.
  • Figure 2: Framework of SLU privacy protection.
  • Figure 3: Diagram of the SH-PPSLU model.
  • Figure 4: Diagram of the H-PPSLU model.
  • Figure 5: Two testing/attacking scenarios.