Speech privacy-preserving methods using secret key for convolutional neural network models and their robustness evaluation
Shoko Niwa, Sayaka Shiota, Hitoshi Kiya
TL;DR
This work tackles privacy in cloud-based CNN speech processing by introducing secret-key encryption for both speech data and the first-layer kernel of the model. Three encryption primitives—Shuffling ($K_s$), Flipping ($K_f$), and ROM ($K_r$)—enable encrypted queries to be processed without decryption when the keys match, with performance degrading under non-matching keys. ROM, in particular, provides a large key space and robust privacy even for small block sizes, supported by extensive experiments across ASV, ASR, and ASC and robustness tests including phase reconstruction and key-space attacks. The approach offers a practical path to privacy-preserving speech services in untrusted cloud environments and sets the stage for extending secret-key encrypted inference to other deep-learning models.
Abstract
In this paper, we propose privacy-preserving methods with a secret key for convolutional neural network (CNN)-based models in speech processing tasks. In environments where untrusted third parties, like cloud servers, provide CNN-based systems, ensuring the privacy of speech queries becomes essential. This paper proposes encryption methods for speech queries using secret keys and a model structure that allows for encrypted queries to be accepted without decryption. Our approach introduces three types of secret keys: Shuffling, Flipping, and random orthogonal matrix (ROM). In experiments, we demonstrate that when the proposed methods are used with the correct key, identification performance did not degrade. Conversely, when an incorrect key is used, the performance significantly decreased. Particularly, with the use of ROM, we show that even with a relatively small key space, high privacy-preserving performance can be maintained many speech processing tasks. Furthermore, we also demonstrate the difficulty of recovering original speech from encrypted queries in various robustness evaluations.
