Stealing AI Model Weights Through Covert Communication Channels

Valentin Barbaza; Alan Rodrigo Diaz-Rizo; Hassan Aboushady; Spyridon Raptis; Haralampos-G. Stratigopoulos

Stealing AI Model Weights Through Covert Communication Channels

Valentin Barbaza, Alan Rodrigo Diaz-Rizo, Hassan Aboushady, Spyridon Raptis, Haralampos-G. Stratigopoulos

TL;DR

The paper addresses the risk of AI model theft from edge devices by introducing a hardware Trojan–enabled covert RF channel that leaks model weights during normal operation. It presents a two-phase attack where an HT enables a covert channel embedded in the PHY preamble of Wi‑Fi frames, allowing an attacker within range to reconstruct the entire weight matrix, regardless of model type or accelerator. Hardware demonstrations across LeNet-5, MobileNetV3-Large, IBM DVS128, and YOLO11n quantify leakage times, BER, and the effectiveness of repetition/voting schemes to preserve baseline accuracy, demonstrating feasibility from seconds to hours depending on model size and channel conditions. The work underscores practical security implications for edge AI and discusses post-silicon defenses, highlighting the challenges in detecting HTs and covert channels and the need for robust mitigation strategies in real-world deployments.

Abstract

AI models are often regarded as valuable intellectual property due to the high cost of their development, the competitive advantage they provide, and the proprietary techniques involved in their creation. As a result, AI model stealing attacks pose a serious concern for AI model providers. In this work, we present a novel attack targeting wireless devices equipped with AI hardware accelerators. The attack unfolds in two phases. In the first phase, the victim's device is compromised with a hardware Trojan (HT) designed to covertly leak model weights through a hidden communication channel, without the victim realizing it. In the second phase, the adversary uses a nearby wireless device to intercept the victim's transmission frames during normal operation and incrementally reconstruct the complete weight matrix. The proposed attack is agnostic to both the AI model architecture and the hardware accelerator used. We validate our approach through a hardware-based demonstration involving four diverse AI models of varying types and sizes. We detail the design of the HT and the covert channel, highlighting their stealthy nature. Additionally, we analyze the impact of bit error rates on the reception and propose an error mitigation technique. The effectiveness of the attack is evaluated based on the accuracy of the reconstructed models with stolen weights and the time required to extract them. Finally, we explore potential defense mechanisms.

Stealing AI Model Weights Through Covert Communication Channels

TL;DR

Abstract

Stealing AI Model Weights Through Covert Communication Channels

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)