Table of Contents
Fetching ...

Analyzing Machine Learning Performance in a Hybrid Quantum Computing and HPC Environment

Samuel T. Bieberich, Michael A. Sandoval

TL;DR

The paper investigates the feasibility and performance of hybrid quantum-classical machine learning within an HPC setting, using PennyLane and IBMQ simulators on Andes and Frontier to run a ground-up QML workflow integrated with PyTorch. It reports substantial speedups from Frontier's GPUs and provides a nuanced analysis of how performance scales with dataset size, qubit count, and distribution across CPUs/GPUs, while identifying bottlenecks in simulator backends and inter-device communication. The work yields practical insights into co-locating quantum simulators with HPC resources and informs design considerations for scalable hybrid QC/HPC pipelines. Together, the findings motivate further intensive scaling studies and simulator optimizations to realize the full potential of hybrid QC/HPC in the near term.

Abstract

We explored the possible benefits of integrating quantum simulators in a "hybrid" quantum machine learning (QML) workflow that uses both classical and quantum computations in a high-performance computing (HPC) environment. Here, we used two Oak Ridge Leadership Computing Facility HPC systems, Andes (a commodity-type Linux cluster) and Frontier (an HPE Cray EX supercomputer), along with quantum computing simulators from PennyLane and IBMQ to evaluate a hybrid QML program -- using a "ground up" approach. Using 1 GPU on Frontier, we found ~56% and ~77% speedups when compared to using Frontier's CPU and a local, non-HPC system, respectively. Analyzing performance on a larger dataset using multiple threads, the Frontier GPUs performed ~92% and ~48% faster than the Andes and Frontier CPUs, respectively. More impressively, this is a ~226% speedup over a local, non-HPC system's runtime using the same simulator and number of threads. We hope that this proof of concept will motivate more intensive hybrid QC/HPC scaling studies in the future.

Analyzing Machine Learning Performance in a Hybrid Quantum Computing and HPC Environment

TL;DR

The paper investigates the feasibility and performance of hybrid quantum-classical machine learning within an HPC setting, using PennyLane and IBMQ simulators on Andes and Frontier to run a ground-up QML workflow integrated with PyTorch. It reports substantial speedups from Frontier's GPUs and provides a nuanced analysis of how performance scales with dataset size, qubit count, and distribution across CPUs/GPUs, while identifying bottlenecks in simulator backends and inter-device communication. The work yields practical insights into co-locating quantum simulators with HPC resources and informs design considerations for scalable hybrid QC/HPC pipelines. Together, the findings motivate further intensive scaling studies and simulator optimizations to realize the full potential of hybrid QC/HPC in the near term.

Abstract

We explored the possible benefits of integrating quantum simulators in a "hybrid" quantum machine learning (QML) workflow that uses both classical and quantum computations in a high-performance computing (HPC) environment. Here, we used two Oak Ridge Leadership Computing Facility HPC systems, Andes (a commodity-type Linux cluster) and Frontier (an HPE Cray EX supercomputer), along with quantum computing simulators from PennyLane and IBMQ to evaluate a hybrid QML program -- using a "ground up" approach. Using 1 GPU on Frontier, we found ~56% and ~77% speedups when compared to using Frontier's CPU and a local, non-HPC system, respectively. Analyzing performance on a larger dataset using multiple threads, the Frontier GPUs performed ~92% and ~48% faster than the Andes and Frontier CPUs, respectively. More impressively, this is a ~226% speedup over a local, non-HPC system's runtime using the same simulator and number of threads. We hope that this proof of concept will motivate more intensive hybrid QC/HPC scaling studies in the future.
Paper Structure (7 sections, 8 figures)

This paper contains 7 sections, 8 figures.

Figures (8)

  • Figure 1: Changes in runtime due to the backend used and training method (Section II).
  • Figure 2: Accuracy (blue, left axis) and runtime (orange, right axis) versus simulator used. (4 qubits, 30 epochs on Andes)
  • Figure 3: Accuracy (blue, left axis) and CPU runtime (orange, right axis) versus qubits. (30 epochs on Andes CPU)
  • Figure 4: Accuracy (blue, left axis) and CPU runtime (orange, right axis) versus total epochs. (4 qubits on Andes CPU)
  • Figure 5: Qubit runtime by system. Execution time for $\rm N=1$ (solid) and $\rm N=8$ (dashed) threads across all systems.
  • ...and 3 more figures