CSI2Dig: Recovering Digit Content from Smartphone Loudspeakers Using Channel State Information

Yangyang Gu; Xianglong Li; Haolin Wu; Jing Chen; Kun He; Ruiying Du; Cong Wu

CSI2Dig: Recovering Digit Content from Smartphone Loudspeakers Using Channel State Information

Yangyang Gu, Xianglong Li, Haolin Wu, Jing Chen, Kun He, Ruiying Du, Cong Wu

TL;DR

CSI2Dig investigates recovering digit sequences played via smartphone loudspeakers by exploiting EMI-induced perturbations in WiFi CSI. The approach combines a two-branch denoising autoencoder with TS-Net, which jointly enhances EMI effects and extracts temporal and spatial CSI features to classify digits. Extensive evaluation across multiple devices, distances, and conditions shows notable Top-5 accuracy (up to around 72.97% in favorable setups and ~63% on average with all distances) and reveals practical attack limitations such as motion sensitivity and obstruction. The work also discusses countermeasures (shielding, component layout, and CSI-filtering defenses) and outlines the trade-offs and limitations of the current attack, highlighting the need for robust defenses in real-world deployments.

Abstract

Eavesdropping on sounds emitted by mobile device loudspeakers can capture sensitive digital information, such as SMS verification codes, credit card numbers, and withdrawal passwords, which poses significant security risks. Existing schemes either require expensive specialized equipment, rely on spyware, or are limited to close-range signal acquisition. In this paper, we propose a scheme, CSI2Dig, for recovering digit content from Channel State Information (CSI) when digits are played through a smartphone loudspeaker. We observe that the electromagnetic interference caused by the audio signals from the loudspeaker affects the WiFi signals emitted by the phone's WiFi antenna. Building upon contrastive learning and denoising autoencoders, we develop a two-branch autoencoder network designed to amplify the impact of this electromagnetic interference on CSI. For feature extraction, we introduce the TS-Net, a model that captures relevant features from both the temporal and spatial dimensions of the CSI data. We evaluate our scheme across various devices, distances, volumes, and other settings. Experimental results demonstrate that our scheme can achieve an accuracy of 72.97%.

CSI2Dig: Recovering Digit Content from Smartphone Loudspeakers Using Channel State Information

TL;DR

Abstract

CSI2Dig: Recovering Digit Content from Smartphone Loudspeakers Using Channel State Information

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)