MagLive: Robust Voice Liveness Detection on Smartphones Using Magnetic Pattern Changes
Xiping Sun, Jing Chen, Cong Wu, Kun He, Haozhe Xu, Yebo Feng, Ruiying Du, Xianhao Chen
TL;DR
MagLive addresses the vulnerability of smartphone voice authentication to replay spoofing by leveraging magnetic pattern changes produced during speech. It introduces a magnetometer-based liveness detector that uses a TF-CNN-SAF feature extractor and supervised contrastive learning to produce user-, device-, and content-irrelevant representations. The approach achieves high security performance, with an average BAC of 99.01% and EER of 0.77% across diverse devices, environments, and attack scenarios, while requiring no active sensing or extra hardware. This work demonstrates a practical, on-device defense that strengthens voice authentication on smartphones with minimal user burden.
Abstract
Voice authentication has been widely used on smartphones. However, it remains vulnerable to spoofing attacks, where the attacker replays recorded voice samples from authentic humans using loudspeakers to bypass the voice authentication system. In this paper, we present MagLive, a robust voice liveness detection scheme designed for smartphones to mitigate such spoofing attacks. MagLive leverages the differences in magnetic pattern changes generated by different speakers (i.e., humans or loudspeakers) when speaking for liveness detection, which are captured by the built-in magnetometer on smartphones. To extract effective and robust magnetic features, MagLive utilizes a TF-CNN-SAF model as the feature extractor, which includes a time-frequency convolutional neural network (TF-CNN) combined with a self-attention-based fusion (SAF) model. Supervised contrastive learning is then employed to achieve user-irrelevance, device-irrelevance, and content-irrelevance. MagLive imposes no additional burden on users and does not rely on active sensing or specialized hardware. We conducted comprehensive experiments with various settings to evaluate the security and robustness of MagLive. Our results demonstrate that MagLive effectively distinguishes between humans and attackers (i.e., loudspeakers), achieving an average balanced accuracy (BAC) of 99.01% and an equal error rate (EER) of 0.77%.
