Table of Contents
Fetching ...

DistributedEstimator: Distributed Training of Quantum Neural Networks via Circuit Cutting

Prabhjot Singh, Adel N. Toosi, Rajkumar Buyya

TL;DR

The measurements show that cutting introduces substantial end-to-end overheads that grow with the number of cuts, and that reconstruction constitutes a dominant fraction of per-query time, bounding achievable speed-up under increased parallelism.

Abstract

Circuit cutting decomposes a large quantum circuit into a collection of smaller subcircuits. The outputs of these subcircuits are then classically reconstructed to recover the original expectation values. While prior work characterises cutting overhead largely in terms of subcircuit counts and sampling complexity, its end-to-end impact on iterative, estimator-driven training pipelines remains insufficiently measured from a systems perspective. In this paper, we propose a cut-aware estimator execution pipeline that treats circuit cutting as a staged distributed workload and instruments each estimator query into partitioning, subexperiment generation, parallel execution, and classical reconstruction phases. Using logged runtime traces and learning outcomes on two binary classification workloads (Iris and MNIST), we quantify cutting overheads, scaling limits, and sensitivity to injected stragglers, and we evaluate whether accuracy and robustness are preserved under matched training budgets. Our measurements show that cutting introduces substantial end-to-end overheads that grow with the number of cuts, and that reconstruction constitutes a dominant fraction of per-query time, bounding achievable speed-up under increased parallelism. Despite these systems costs, test accuracy and robustness are preserved in the measured regimes, with configuration-dependent improvements observed in some cut settings. These results indicate that practical scaling of circuit cutting for learning workloads hinges on reducing and overlapping reconstruction and on scheduling policies that account for barrier-dominated critical paths.

DistributedEstimator: Distributed Training of Quantum Neural Networks via Circuit Cutting

TL;DR

The measurements show that cutting introduces substantial end-to-end overheads that grow with the number of cuts, and that reconstruction constitutes a dominant fraction of per-query time, bounding achievable speed-up under increased parallelism.

Abstract

Circuit cutting decomposes a large quantum circuit into a collection of smaller subcircuits. The outputs of these subcircuits are then classically reconstructed to recover the original expectation values. While prior work characterises cutting overhead largely in terms of subcircuit counts and sampling complexity, its end-to-end impact on iterative, estimator-driven training pipelines remains insufficiently measured from a systems perspective. In this paper, we propose a cut-aware estimator execution pipeline that treats circuit cutting as a staged distributed workload and instruments each estimator query into partitioning, subexperiment generation, parallel execution, and classical reconstruction phases. Using logged runtime traces and learning outcomes on two binary classification workloads (Iris and MNIST), we quantify cutting overheads, scaling limits, and sensitivity to injected stragglers, and we evaluate whether accuracy and robustness are preserved under matched training budgets. Our measurements show that cutting introduces substantial end-to-end overheads that grow with the number of cuts, and that reconstruction constitutes a dominant fraction of per-query time, bounding achievable speed-up under increased parallelism. Despite these systems costs, test accuracy and robustness are preserved in the measured regimes, with configuration-dependent improvements observed in some cut settings. These results indicate that practical scaling of circuit cutting for learning workloads hinges on reducing and overlapping reconstruction and on scheduling policies that account for barrier-dominated critical paths.
Paper Structure (13 sections, 3 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 3 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: Inpedependent subcircuits generated by cutting circuit.
  • Figure 2: Training pipeline with a cut-aware distributed estimator. Circuit cutting expands each estimator query in the QNN forward/gradient evaluation into parallel subexperiments followed by classical reconstruction.
  • Figure 3: Per-query expansion under circuit cutting. Subexperiments execute across $w$ workers; reconstruction forms a barrier and is sensitive to straggler delays.
  • Figure 4: RQ1: End-to-end training time under clean execution. Left: Iris ($\textit{maxiter}=60$). Right: MNIST (10 epochs).
  • Figure 5: RQ2: Scaling behaviour under clean execution. Bars report speed-up at 16 workers relative to 1 worker for matched pairs. Left: Iris ($\textit{maxiter}=10$). Right: MNIST (5 epochs).
  • ...and 3 more figures