Your Microphone Array Retains Your Identity: A Robust Voice Liveness Detection System for Smart Speakers
Yan Meng, Jiachun Li, Matthew Pillari, Arjun Deopujari, Liam Brennan, Hafsah Shamsie, Haojin Zhu, Yuan Tian
TL;DR
This work addresses spoofing in voice-enabled smart speakers by introducing ArrayID, a passive, device-free liveness detector that exploits a microphone array’s spatial diversity. Grounded in a theoretical sound-propagation model, it defines an array fingerprint that remains robust to environment changes and user movement, and combines it with spectrogram-based and LPCC features in a lightweight neural classifier. On a large MALD dataset (32,780 samples, 14 spoofing devices) and a public Remasc Core dataset, ArrayID achieves up to 99.84% accuracy and strong resilience against distance, direction, noise, and advanced spoofing such as modulated attacks, outperforming mono-channel and two-channel baselines. The approach also provides a practical pipeline, a concise feature set, fast inference, and a new dataset to advance liveness detection research in real-world smart-home scenarios.
Abstract
Though playing an essential role in smart home systems, smart speakers are vulnerable to voice spoofing attacks. Passive liveness detection, which utilizes only the collected audio rather than the deployed sensors to distinguish between live-human and replayed voices, has drawn increasing attention. However, it faces the challenge of performance degradation under the different environmental factors as well as the strict requirement of the fixed user gestures. In this study, we propose a novel liveness feature, array fingerprint, which utilizes the microphone array inherently adopted by the smart speaker to determine the identity of collected audios. Our theoretical analysis demonstrates that by leveraging the circular layout of microphones, compared with existing schemes, array fingerprint achieves a more robust performance under the environmental change and user's movement. Then, to leverage such a fingerprint, we propose ARRAYID, a lightweight passive detection scheme, and elaborate a series of features working together with array fingerprint. Our evaluation on the dataset containing 32,780 audio samples and 14 spoofing devices shows that ARRAYID achieves an accuracy of 99.84%, which is superior to existing passive liveness detection schemes.
