Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads

Aritra Dhar; Clément Thorens; Lara Magdalena Lazier; Lukas Cavigelli

Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads

Aritra Dhar, Clément Thorens, Lara Magdalena Lazier, Lukas Cavigelli

TL;DR

Ascend-CC introduces a confidential computing architecture for discrete NPUs that operates without a CPU TEE, protecting data, model parameters, and operator binaries from an untrusted host. It relies on memory lock invariants, AES-GCM-based in-device encryption, and model/task attestation anchored by a hardware root of trust and measured boot, enabling end-to-end confidentiality for LLM workloads. Implemented on the Huawei Ascend 910A, Ascend-CC demonstrates minimal overhead in LLM inference for large models (e.g., Llama2/Llama3) with no changes to existing AI software stacks, validating its practicality for cloud GenAI scenarios. The approach generalizes to other task-based accelerators and offers a scalable path to confidential computing across multi-party AI deployments.

Abstract

Cloud workloads have dominated generative AI based on large language models (LLM). Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs. The AI models and the data are often highly sensitive and come from mutually distrusting parties. Existing CPU-based TEEs such as Intel SGX or AMD SEV do not provide sufficient protection. Device-centric TEEs like Nvidia-CC only address tightly coupled CPU-GPU systems with a proprietary solution requiring TEE on the host CPU side. On the other hand, existing academic proposals are tailored toward specific CPU-TEE platforms. To address this gap, we propose Ascend-CC, a confidential computing architecture based on discrete NPU devices that requires no trust in the host system. Ascend-CC provides strong security by ensuring data and model encryption that protects not only the data but also the model parameters and operator binaries. Ascend-CC uses delegation-based memory semantics to ensure isolation from the host software stack, and task attestation provides strong model integrity guarantees. Our Ascend-CC implementation and evaluation with state-of-the-art LLMs such as Llama2 and Llama3 shows that Ascend-CC introduces minimal overhead with no changes in the AI software stack.

Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads

TL;DR

Abstract

Paper Structure (20 sections, 14 figures, 1 table)

This paper contains 20 sections, 14 figures, 1 table.

Introduction
NPU Background
Motivation and Attacker Model
Security Challenges and Requirements
Basic Building Blocks for Confidential Computing on the Ascend NPU
Model and Data Encryption
AI CPU-based custom operator
Executing AI CPU operator with model
Enforcing Memory Lock Invariants
Model and Task Attestation
Firmware and Runtime Integrity
Ascend-CC
Security Analysis
Implementation and Evaluation
Ascend-CC Implementation
...and 5 more sections

Figures (14)

Figure 1: The figure shows a high-level architecture of Ascend 910A SoC along with the shared virtual memory with a 64-bit host CPU.
Figure 2: Memory footprint of LLama-3-8B and Llama-2-13B in Ascend 910A NPU with 32GB HBM.
Figure 3: An example code of matrix multiplication on Ascend NPU.
Figure 4: An example matrix multiplication task and memory layout on NPU, corresponding to the code snippet in \ref{['fig:mm_example_code']}.
Figure 5: Parallel cryptographic operation on model and data to hide the latency introduced by the AES-GCM operator running on the AI-CPU cores. The AI core executes the AI-related operations, such as the layer computation during an inference pass.
...and 9 more figures

Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads

TL;DR

Abstract

Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads

Authors

TL;DR

Abstract

Table of Contents

Figures (14)