Table of Contents
Fetching ...

An Early Experience with Confidential Computing Architecture for On-Device Model Protection

Sina Abdollahi, Mohammad Maheri, Sandra Siby, Marios Kogias, Hamed Haddadi

TL;DR

The paper investigates Arm Confidential Compute Architecture (CCA) as a framework for protecting on-device ML models by running them inside realm VMs. It defines a deployment framework, analyzes overhead sources, and demonstrates privacy benefits via a membership inference attack, reporting up to 22% inference overhead and an 8.3% reduction in attack success. The evaluation uses hardware emulation (FVP) and attenuation through attestation-enabled realm execution, confirming the viability of confidential on-device inference while releasing code for early adoption. Limitations include reliance on emulation rather than real hardware and the need for hardware support for extensive accelerator integration and full end-to-end privacy guarantees.

Abstract

Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for protecting proprietary models, yet existing TEE solutions have architectural constraints that hinder on-device model deployment. Arm Confidential Computing Architecture (CCA), a new Arm extension, addresses several of these limitations and shows promise as a secure platform for on-device ML. In this paper, we evaluate the performance-privacy trade-offs of deploying models within CCA, highlighting its potential to enable confidential and efficient ML applications. Our evaluations show that CCA can achieve an overhead of, at most, 22% in running models of different sizes and applications, including image classification, voice recognition, and chat assistants. This performance overhead comes with privacy benefits; for example, our framework can successfully protect the model against membership inference attack by an 8.3% reduction in the adversary's success rate. To support further research and early adoption, we make our code and methodology publicly available.

An Early Experience with Confidential Computing Architecture for On-Device Model Protection

TL;DR

The paper investigates Arm Confidential Compute Architecture (CCA) as a framework for protecting on-device ML models by running them inside realm VMs. It defines a deployment framework, analyzes overhead sources, and demonstrates privacy benefits via a membership inference attack, reporting up to 22% inference overhead and an 8.3% reduction in attack success. The evaluation uses hardware emulation (FVP) and attenuation through attestation-enabled realm execution, confirming the viability of confidential on-device inference while releasing code for early adoption. Limitations include reliance on emulation rather than real hardware and the need for hardware support for extensive accelerator integration and full end-to-end privacy guarantees.

Abstract

Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for protecting proprietary models, yet existing TEE solutions have architectural constraints that hinder on-device model deployment. Arm Confidential Computing Architecture (CCA), a new Arm extension, addresses several of these limitations and shows promise as a secure platform for on-device ML. In this paper, we evaluate the performance-privacy trade-offs of deploying models within CCA, highlighting its potential to enable confidential and efficient ML applications. Our evaluations show that CCA can achieve an overhead of, at most, 22% in running models of different sizes and applications, including image classification, voice recognition, and chat assistants. This performance overhead comes with privacy benefits; for example, our framework can successfully protect the model against membership inference attack by an 8.3% reduction in the adversary's success rate. To support further research and early adoption, we make our code and methodology publicly available.

Paper Structure

This paper contains 17 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Arm CCA software architecture. The hypervisor allocates resources to realms but cannot access those resources, due to isolation boundaries between the realm and the normal world
  • Figure 2: Overview of the steps required for running a ML model on the client edge device. We show a simplified view of the normal and realm worlds within the client. The client's steps are (1) obtaining realm image from verifier (2) creating and activating a realm VM (3) establishing connection with provider (4) realm attestation (5) obtaining model from provider (6) announcing model readiness to normal world (7) running inference (8) performing model updates.