NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

Yicheng Zhang; Ravan Nazaraliyev; Sankha Baran Dutta; Andres Marquez; Kevin Barker; Nael Abu-Ghazaleh

NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

Yicheng Zhang, Ravan Nazaraliyev, Sankha Baran Dutta, Andres Marquez, Kevin Barker, Nael Abu-Ghazaleh

TL;DR

This work reveals covert and side-channel leakage via NVIDIA NVLink in multi-GPU systems by reverse-engineering NVLink operation and uncovering two primary leakage channels: timing variations from contention and accessible performance counters. It introduces two intra-VM covert channels (ContenLink and LeakyCounterLink) and two intra-VM side-channel attacks (application fingerprinting and Blender 3D character identification), achieving up to 70.59 Kbps covert-channel bandwidth and high classification accuracy (up to ~97% F1). It further demonstrates cross-VM leakage on Google Cloud Platform, enabling a cross-VM Blender fingerprint attack with F1 scores over 88%, underscoring risks even when victim and attacker reside in separate VMs. The paper discusses mitigations, including restricting counter access and reducing clock-resolution, and argues that communication-based leakage may exist in other multi-accelerator interconnects, warranting broader defenses and further research.

Abstract

Multi-GPU systems are becoming increasingly important in highperformance computing (HPC) and cloud infrastructure, providing acceleration for data-intensive applications, including machine learning workloads. These systems consist of multiple GPUs interconnected through high-speed networking links such as NVIDIA's NVLink. In this work, we explore whether the interconnect on such systems can offer a novel source of leakage, enabling new forms of covert and side-channel attacks. Specifically, we reverse engineer the operations of NVlink and identify two primary sources of leakage: timing variations due to contention and accessible performance counters that disclose communication patterns. The leakage is visible remotely and even across VM instances in the cloud, enabling potentially dangerous attacks. Building on these observations, we develop two types of covert-channel attacks across two GPUs, achieving a bandwidth of over 70 Kbps with an error rate of 4.78% for the contention channel. We develop two end-to-end crossGPU side-channel attacks: application fingerprinting (including 18 high-performance computing and deep learning applications) and 3D graphics character identification within Blender, a multi-GPU rendering application. These attacks are highly effective, achieving F1 scores of up to 97.78% and 91.56%, respectively. We also discover that leakage surprisingly occurs across Virtual Machines on the Google Cloud Platform (GCP) and demonstrate a side-channel attack on Blender, achieving F1 scores exceeding 88%. We also explore potential defenses such as managing access to counters and reducing the resolution of the clock to mitigate the two sources of leakage.

NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

TL;DR

Abstract

NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)