Table of Contents
Fetching ...

Platform Architecture for Tight Coupling of High-Performance Computing with Quantum Processors

Shane A. Caldwell, Moein Khazraee, Elena Agostini, Tom Lassiter, Corey Simpson, Omri Kahalon, Mrudula Kanuri, Jin-Sung Kim, Sam Stanwyck, Muyuan Li, Jan Olle, Christopher Chamberland, Ben Howe, Bruno Schmitt, Justin G. Lietz, Alex McCaskey, Jun Ye, Ang Li, Alicia B. Magann, Corey I. Ostrove, Kenneth Rudinger, Robin Blume-Kohout, Kevin Young, Nathan E. Miller, Yilun Xu, Gang Huang, Irfan Siddiqi, John Lange, Christopher Zimmer, Travis Humble

TL;DR

NVQLink presents a practical architecture that tightly couples HPC resources to QPU control systems to support online workloads like QEC, achieving sub-4$\mu s$ round-trips over a RoCE network and enabling real-time device callbacks via CUDA-Q. It introduces a robust programming model with device_call and device_ptr, a trait-based runtime, and an open compilation/execution flow that adapts across high- and low-latency regimes, including VPPU and PQPU simulation tools for offline development. The approach addresses QEC throughput and reaction-time requirements, highlights lattice-surgery-based scalable fault-tolerant execution, and discusses calibration and QCVV workloads that benefit from tight CPU/GPU co-processing. Together with the development tools and open specification, NVQLink aims to accelerate the path to fault-tolerant quantum computing by providing a scalable, vendor-agnostic platform for real-time quantum-classical co-processing.

Abstract

We propose an architecture, called NVQLink, for connecting high-performance computing (HPC) resources to the control system of a quantum processing unit (QPU) to accelerate workloads necessary to the operation of the QPU. We aim to support every physical modality of QPU and every type of QPU system controller (QSC). The HPC resource is optimized for real-time (latency-bounded) processing on tasks with latency tolerances of tens of microseconds. The network connecting the HPC and QSC is implemented on commercially available Ethernet and can be adopted relatively easily by QPU and QSC builders, and we report a round-trip latency measurement of 3.96 microseconds (max) with prospects of further optimization. We describe an extension to the CUDA-Q programming model and runtime architecture to support real-time callbacks and data marshaling between the HPC and QSC. By doing so, NVQLink extends heterogeneous, kernel-based programming to the QSC, allowing the programmer to address CPU, GPU, and FPGA subsystems in the QSC, all in the same C++ program, avoiding the use of a performance-limiting HTTP interface. We provide a pattern for QSC builders to integrate with this architecture by making use of multi-level intermediate representation dialects and progressive lowering to encapsulate QSC code.

Platform Architecture for Tight Coupling of High-Performance Computing with Quantum Processors

TL;DR

NVQLink presents a practical architecture that tightly couples HPC resources to QPU control systems to support online workloads like QEC, achieving sub-4 round-trips over a RoCE network and enabling real-time device callbacks via CUDA-Q. It introduces a robust programming model with device_call and device_ptr, a trait-based runtime, and an open compilation/execution flow that adapts across high- and low-latency regimes, including VPPU and PQPU simulation tools for offline development. The approach addresses QEC throughput and reaction-time requirements, highlights lattice-surgery-based scalable fault-tolerant execution, and discusses calibration and QCVV workloads that benefit from tight CPU/GPU co-processing. Together with the development tools and open specification, NVQLink aims to accelerate the path to fault-tolerant quantum computing by providing a scalable, vendor-agnostic platform for real-time quantum-classical co-processing.

Abstract

We propose an architecture, called NVQLink, for connecting high-performance computing (HPC) resources to the control system of a quantum processing unit (QPU) to accelerate workloads necessary to the operation of the QPU. We aim to support every physical modality of QPU and every type of QPU system controller (QSC). The HPC resource is optimized for real-time (latency-bounded) processing on tasks with latency tolerances of tens of microseconds. The network connecting the HPC and QSC is implemented on commercially available Ethernet and can be adopted relatively easily by QPU and QSC builders, and we report a round-trip latency measurement of 3.96 microseconds (max) with prospects of further optimization. We describe an extension to the CUDA-Q programming model and runtime architecture to support real-time callbacks and data marshaling between the HPC and QSC. By doing so, NVQLink extends heterogeneous, kernel-based programming to the QSC, allowing the programmer to address CPU, GPU, and FPGA subsystems in the QSC, all in the same C++ program, avoiding the use of a performance-limiting HTTP interface. We provide a pattern for QSC builders to integrate with this architecture by making use of multi-level intermediate representation dialects and progressive lowering to encapsulate QSC code.

Paper Structure

This paper contains 34 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Machine model of the Logical QPU. The NVQLink architecture comprises the Real-time Host (RTH) and QPU Control System (QSC) connected by a low-latency, scalable Real-time Interconnect joining them into a network capable of handling the runtime workloads of a Fault Tolerant Quantum Computer. The RTH contains traditional HPC compute resources (CPUs and GPUs and perhaps other specialized hardware), while the QSC contains the Pulse Processing Units controlling the QPU. These compute resources also comprise the key memory and storage resources for the application to consider and orchestrate including GPU memory, RTH system main memory, etc. The programming model for this system is built to recognize all CPUs, GPUs, and resources within the QSC including CPUs, PPUs, and other specialized FPGA resources, as targetable Devices and enable Real-time Callback functions (fn) among them to support distributed processing and data marshaling. To support this, a small and optional Network Interface (NI) is provided that enables unilateral and private adoption by the QSC builder. This construction affords flexibility in the value chain of Physical QPU, QSC, and runtime protocols such as QEC Decoding and Online Calibration: each of these components is provided by a third party who may build or integrate some or all of the system and who may share their implementation of each component or keep it proprietary at their discretion. The purpose of this architecture is support such flexibility while enabling every implementation to achieve state-of-the-art HPC performance at minimal cost and time to solution.
  • Figure 2: Proof of Concept setup
  • Figure 3: Proof of Concept flow
  • Figure 4: Observing slight increased warm-up latency at the beginning of a run
  • Figure 5: Steady state latency during the same run as Fig. \ref{['fig:net_latency_start']}
  • ...and 4 more figures

Theorems & Definitions (2)

  • definition 1: Quantum Kernel
  • definition 2: Device