Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

Yoshiaki Kawase

Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

Yoshiaki Kawase

TL;DR

The proposed method drastically accelerates classical simulation of quantum machine learning, making a significant contribution to quantum machine learning research and variational quantum algorithms, such as verifying algorithms on large datasets or investigating learning theories of deep quantum circuits like barren plateau.

Abstract

While real quantum devices have been increasingly used to conduct research focused on achieving quantum advantage or quantum utility in recent years, executing deep quantum circuits or performing quantum machine learning with large-scale data on current noisy intermediate-scale quantum devices remains challenging, making classical simulation essential for quantum machine learning research. However, classical simulation often suffers from the cost of gradient calculations, requiring enormous memory or computational time. In this paper, to address these problems, we propose a method to fuse multiple consecutive gates in each of the forward and backward paths to improve throughput by minimizing global memory accesses. As a result, we achieved approximately $20$ times throughput improvement for a Hardware-Efficient Ansatz with $12$ or more qubits, reaching over $30$ times improvement on a mid-range consumer GPU with limited memory bandwidth. By combining our proposed method with gradient checkpointing, we drastically reduce memory usage, making it possible to train a large-scale quantum machine learning model, a $20$-qubit, $1,000$-layer model with $60,000$ parameters, using $1,000$ samples in approximately $20$ minutes. This implies that we can train the model on large datasets, consisting of tens of thousands of samples, such as MNIST or CIFAR-10, within a realistic time frame (e.g., $20$ hours per epoch). In this way, our proposed method drastically accelerates classical simulation of quantum machine learning, making a significant contribution to quantum machine learning research and variational quantum algorithms, such as verifying algorithms on large datasets or investigating learning theories of deep quantum circuits like barren plateau.

Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

TL;DR

Abstract

times throughput improvement for a Hardware-Efficient Ansatz with

or more qubits, reaching over

times improvement on a mid-range consumer GPU with limited memory bandwidth. By combining our proposed method with gradient checkpointing, we drastically reduce memory usage, making it possible to train a large-scale quantum machine learning model, a

-qubit,

-layer model with

parameters, using

samples in approximately

minutes. This implies that we can train the model on large datasets, consisting of tens of thousands of samples, such as MNIST or CIFAR-10, within a realistic time frame (e.g.,

hours per epoch). In this way, our proposed method drastically accelerates classical simulation of quantum machine learning, making a significant contribution to quantum machine learning research and variational quantum algorithms, such as verifying algorithms on large datasets or investigating learning theories of deep quantum circuits like barren plateau.

Paper Structure (21 sections, 13 equations, 11 figures, 2 tables)

This paper contains 21 sections, 13 equations, 11 figures, 2 tables.

Introduction
Preliminaries
State vector simulation and objective function
Adjoint method
Methods
Applying a quantum gate
Gate fusion in the forward path
Gate fusion in the backward path
Theoretical memory analysis with gradient checkpointing
Numerical experiments
The effect of gate fusion
Throughput for a Hardware Efficient Ansatz
Classical simulation of $1,000$ layers of HEA using gradient checkpointing
Preliminary experiment to determine checkpoint block size $b$
Numerical experiment for a practical scale QML benchmark
...and 6 more sections

Figures (11)

Figure 1: Conventional method and proposed method
Figure 2: Stored state vectors when combining our proposed method with gradient checkpointing
Figure 3: A quantum circuit consisting of $m$ consecutive Rx, Ry, and Rz gates used for the benchmark of verifying the effect of our proposed method
Figure 4: (a) Execution time per fused gate and (b) Total peak memory usage when applying $m$ consecutive single-qubit gates on each qubit during the forward path
Figure 5: (a) Execution time per fused gate and (b) Total peak memory usage when applying $m$ consecutive single-qubit gates on each qubit at the backward path
...and 6 more figures

Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

TL;DR

Abstract

Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (11)