Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads
Aritra Dhar, Clément Thorens, Lara Magdalena Lazier, Lukas Cavigelli
TL;DR
Ascend-CC introduces a confidential computing architecture for discrete NPUs that operates without a CPU TEE, protecting data, model parameters, and operator binaries from an untrusted host. It relies on memory lock invariants, AES-GCM-based in-device encryption, and model/task attestation anchored by a hardware root of trust and measured boot, enabling end-to-end confidentiality for LLM workloads. Implemented on the Huawei Ascend 910A, Ascend-CC demonstrates minimal overhead in LLM inference for large models (e.g., Llama2/Llama3) with no changes to existing AI software stacks, validating its practicality for cloud GenAI scenarios. The approach generalizes to other task-based accelerators and offers a scalable path to confidential computing across multi-party AI deployments.
Abstract
Cloud workloads have dominated generative AI based on large language models (LLM). Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs. The AI models and the data are often highly sensitive and come from mutually distrusting parties. Existing CPU-based TEEs such as Intel SGX or AMD SEV do not provide sufficient protection. Device-centric TEEs like Nvidia-CC only address tightly coupled CPU-GPU systems with a proprietary solution requiring TEE on the host CPU side. On the other hand, existing academic proposals are tailored toward specific CPU-TEE platforms. To address this gap, we propose Ascend-CC, a confidential computing architecture based on discrete NPU devices that requires no trust in the host system. Ascend-CC provides strong security by ensuring data and model encryption that protects not only the data but also the model parameters and operator binaries. Ascend-CC uses delegation-based memory semantics to ensure isolation from the host software stack, and task attestation provides strong model integrity guarantees. Our Ascend-CC implementation and evaluation with state-of-the-art LLMs such as Llama2 and Llama3 shows that Ascend-CC introduces minimal overhead with no changes in the AI software stack.
