Table of Contents
Fetching ...

TorchQuantumDistributed

Oliver Knitter, Jonathan Mei, Masako Yamada, Martin Roetteler

TL;DR

TorchQuantumDistributed (tqd) tackles the scalability gap in differentiable quantum statevector simulation by distributing the state across multiple accelerators within PyTorch. It combines a distributed statevector sharding strategy, a universal gate set, differentiable shot noise handling (exact and approximate), and invertible backpropagation for memory-efficient training. The paper details practical implementation aspects such as bookkeeping, tensor dimension order, sharding, shot-noise strategies, and backprop, supported by profiling results on large HPC hardware that demonstrate favorable scaling. These contributions enable scalable QML experimentation and can be extended with circuit cutting and integration into future research pipelines.

Abstract

TorchQuantumDistributed (tqd) is a PyTorch-based [Paszke et al., 2019] library for accelerator-agnostic differentiable quantum state vector simulation at scale. This enables studying the behavior of learnable parameterized near-term and fault- tolerant quantum circuits with high qubit counts.

TorchQuantumDistributed

TL;DR

TorchQuantumDistributed (tqd) tackles the scalability gap in differentiable quantum statevector simulation by distributing the state across multiple accelerators within PyTorch. It combines a distributed statevector sharding strategy, a universal gate set, differentiable shot noise handling (exact and approximate), and invertible backpropagation for memory-efficient training. The paper details practical implementation aspects such as bookkeeping, tensor dimension order, sharding, shot-noise strategies, and backprop, supported by profiling results on large HPC hardware that demonstrate favorable scaling. These contributions enable scalable QML experimentation and can be extended with circuit cutting and integration into future research pipelines.

Abstract

TorchQuantumDistributed (tqd) is a PyTorch-based [Paszke et al., 2019] library for accelerator-agnostic differentiable quantum state vector simulation at scale. This enables studying the behavior of learnable parameterized near-term and fault- tolerant quantum circuits with high qubit counts.

Paper Structure

This paper contains 27 sections, 8 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: An example dimensional arrangement for a TQD distributed tensor representating a nine qubit statevector, with two sharded qubits. TQD always reserves the first and last dimension for batching and to contain the real and imaginary parts. Sharded qubits correspond with the dimensions preceding the final one, and at least two unsharded qubits are always kept ungrouped.
  • Figure 2: Left: exact sampling breaks differentiability. Right: approximate sampling uses the reparameterization trick to maintain differentiability.
  • Figure 3: The primary building block for the ansatz (left) consists of unitaries with a ladder structure across the width of the circuit. Each block $U_i$ is defined as a combination of a controlled NOT and a single-qubit rotation $R_Y$ of angle $\theta_i$ (right).
  • Figure 4: Basic "strong" and "weak" scaling tests of tqd, applying the ansatz from figure \ref{['fig:unitary']} to qubit sizes between 18 and 28. We vary the number of accelerators between 1 and 1024. We perform benchmarking by collecting the walltime, total NCCL all-to-all communication time, and total memory usage recorded by a single GPU for one forward--backward pass through the ansatz.