Towards provably efficient quantum algorithms for large-scale machine-learning models

Junyu Liu; Minzhao Liu; Jin-Peng Liu; Ziyu Ye; Yunfei Wang; Yuri Alexeev; Jens Eisert; Liang Jiang

Towards provably efficient quantum algorithms for large-scale machine-learning models

Junyu Liu, Minzhao Liu, Jin-Peng Liu, Ziyu Ye, Yunfei Wang, Yuri Alexeev, Jens Eisert, Liang Jiang

TL;DR

The paper investigates fault-tolerant quantum algorithms to accelerate training of large-scale machine-learning models by applying quantum Carleman linearization to convert stochastic gradient descent into a linear quantum dynamical system solvable via an HHL-type solver; this enables provable speedups under sparsity and dissipation. It formalizes two regimes, yielding runtimes of $O\left(T \times \text{polylog}\left(n,1/\epsilon\right)\right)$ for fully dissipative sparse models and $O\left(T^2 \times \text{polylog}\left(n,1/\epsilon\right)\right)$ for almost dissipative cases, with the quantum-system size $m=\log_2(n)$. Numerical experiments at scales up to $n\sim 10^8$ parameters demonstrate dissipation-dominated behavior and an early-stage quantum enhancement after pruning, supported by a sparse-download/upload workflow and QRAM-assisted variants. The work provides theoretical guarantees and concrete numerical evidence suggesting potential quantum advantages for selective large-scale ML training, while acknowledging that speedups are conditional on structural properties and may not generalize to all problem classes. Future directions include refining discrete-time Carleman linearization, improving dissipation criteria, and exploring connections to diffusion models and improved truncated-HHL schemes.

Abstract

Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge computational expenses, power, and time used both in the pre-training and fine-tuning process. In this work, we show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms, scaling as O(T^2 polylog(n)), where n is the size of the models and T is the number of iterations in the training, as long as the models are both sufficiently dissipative and sparse, with small learning rates. Based on earlier efficient quantum algorithms for dissipative differential equations, we find and prove that similar algorithms work for (stochastic) gradient descent, the primary algorithm for machine learning. In practice, we benchmark instances of large machine learning models from 7 million to 103 million parameters. We find that, in the context of sparse training, a quantum enhancement is possible at the early stage of learning after model pruning, motivating a sparse parameter download and re-upload scheme. Our work shows solidly that fault-tolerant quantum algorithms could potentially contribute to most state-of-the-art, large-scale machine-learning problems.

Towards provably efficient quantum algorithms for large-scale machine-learning models

TL;DR

for fully dissipative sparse models and

for almost dissipative cases, with the quantum-system size

. Numerical experiments at scales up to

parameters demonstrate dissipation-dominated behavior and an early-stage quantum enhancement after pruning, supported by a sparse-download/upload workflow and QRAM-assisted variants. The work provides theoretical guarantees and concrete numerical evidence suggesting potential quantum advantages for selective large-scale ML training, while acknowledging that speedups are conditional on structural properties and may not generalize to all problem classes. Future directions include refining discrete-time Carleman linearization, improving dissipation criteria, and exploring connections to diffusion models and improved truncated-HHL schemes.

Abstract

Paper Structure (6 sections, 2 theorems, 8 equations, 3 figures)

This paper contains 6 sections, 2 theorems, 8 equations, 3 figures.

Theorems
Linearizing classical neural networks
Numerical analysis
Conclusion and outlook
Data availability statement
Inclusion and ethics statement

Key Result

Theorem 1

For a sparse machine learning model with model size $n$, running $T$ iterations, with the algorithm being fully dissipative with small learning rates (whose formal definition is given in the supplemental material), there is a quantum algorithm that runs in time with precision $\epsilon>0$. The sparsity condition also ensures the efficiency of uploading and downloading quantum states towards class

Figures (3)

Figure 1: A possible learning process in large-scale models, which might use sparse training, whose early stage in learning might admit possible quantum enhancement. A dense neural network is pre-trained classically. The neural network weights are then pruned and only a small fraction is preserved. A quantum ordinary difference equation system that corresponds to the sparse training dynamics is created using the sparse network and the training data. To allow quantum enhancement, the system must be sparse and dissipative. Measurement on the solution state is performed to obtain the final trained parameters, used to construct a trained classical sparse neural network.
Figure 2: (a) - (c) Numerical results on ResNet as a function of step. Each step corresponds to a step of stochastic gradient descent based on the derivatives of the loss computed from 2048 randomly selected training samples. (a) ResNet Hessian spectra during training. (b) Estimated error proxy during training. (c) Training accuracy evolution for ResNet.
Figure 3: Hessian of the pruned $103$ million parameter model immediately after pruning without any additional training.

Theorems & Definitions (2)

Theorem 1: Informal
Theorem 2: Informal

Towards provably efficient quantum algorithms for large-scale machine-learning models

TL;DR

Abstract

Towards provably efficient quantum algorithms for large-scale machine-learning models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)