Table of Contents
Fetching ...

A Model Stealing Attack Against Multi-Exit Networks

Li Pan, Lv Peizhuo, Chen Kai, Zhang Shengzhi, Cai Yuling, Xiang Fan

TL;DR

This work addresses the challenge of stealing both the utility and the exit strategy of multi-exit networks, which offer early exits for efficiency. It introduces a KDE-based Estimation Stage to infer exit timing from runtimes, paired with a Training Stage that optimizes a performance loss and a strategy loss to align outputs and exit behavior, and an Output Strategy Search to select thresholds that maximize agreement with the victim. The approach demonstrates that an attacker can closely replicate both accuracy and computational efficiency of the victim across diverse datasets and backbone architectures, significantly preserving the multi-exit advantages in the extracted model. The results highlight practical risks to deploying multi-exit networks in black-box settings and suggest a need for defenses against output-strategy leakage and runtime-based side channels.

Abstract

Compared to traditional neural networks with a single output channel, a multi-exit network has multiple exits that allow for early outputs from the model's intermediate layers, thus significantly improving computational efficiency while maintaining similar main task accuracy. Existing model stealing attacks can only steal the model's utility while failing to capture its output strategy, i.e., a set of thresholds used to determine from which exit to output. This leads to a significant decrease in computational efficiency for the extracted model, thereby losing the advantage of multi-exit networks. In this paper, we propose the first model stealing attack against multi-exit networks to extract both the model utility and the output strategy. We employ Kernel Density Estimation to analyze the target model's output strategy and use performance loss and strategy loss to guide the training of the extracted model. Furthermore, we design a novel output strategy search algorithm to maximize the consistency between the victim model and the extracted model's output behaviors. In experiments across multiple multi-exit networks and benchmark datasets, our method always achieves accuracy and efficiency closest to the victim models.

A Model Stealing Attack Against Multi-Exit Networks

TL;DR

This work addresses the challenge of stealing both the utility and the exit strategy of multi-exit networks, which offer early exits for efficiency. It introduces a KDE-based Estimation Stage to infer exit timing from runtimes, paired with a Training Stage that optimizes a performance loss and a strategy loss to align outputs and exit behavior, and an Output Strategy Search to select thresholds that maximize agreement with the victim. The approach demonstrates that an attacker can closely replicate both accuracy and computational efficiency of the victim across diverse datasets and backbone architectures, significantly preserving the multi-exit advantages in the extracted model. The results highlight practical risks to deploying multi-exit networks in black-box settings and suggest a need for defenses against output-strategy leakage and runtime-based side channels.

Abstract

Compared to traditional neural networks with a single output channel, a multi-exit network has multiple exits that allow for early outputs from the model's intermediate layers, thus significantly improving computational efficiency while maintaining similar main task accuracy. Existing model stealing attacks can only steal the model's utility while failing to capture its output strategy, i.e., a set of thresholds used to determine from which exit to output. This leads to a significant decrease in computational efficiency for the extracted model, thereby losing the advantage of multi-exit networks. In this paper, we propose the first model stealing attack against multi-exit networks to extract both the model utility and the output strategy. We employ Kernel Density Estimation to analyze the target model's output strategy and use performance loss and strategy loss to guide the training of the extracted model. Furthermore, we design a novel output strategy search algorithm to maximize the consistency between the victim model and the extracted model's output behaviors. In experiments across multiple multi-exit networks and benchmark datasets, our method always achieves accuracy and efficiency closest to the victim models.
Paper Structure (11 sections, 4 equations, 1 figure, 3 tables)