Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li, Junyu Liu, Hanrui Wang, Tianlong Chen
TL;DR
This paper tackles the heavy computational cost of second-order optimization in neural networks by introducing Q-Newton, a hybrid quantum-classical scheduler that dynamically routes Hessian inversions between quantum and classical solvers based on real-time conditioning and sparsity. It combines Hessian estimation, pruning, and adaptive regularization to make Hessian inversions more quantum-friendly, enabling selective offloading to quantum linear solvers (QLSAs) where advantageous. Empirical results across DNNs, BERT, and GPT demonstrate up to $90\%$ reductions in training time compared to classical Newton's GD without sacrificing accuracy, with Hessians progressively becoming more favorable for quantum acceleration during training. This work highlights a practical quantum-classical co-design pathway for accelerating ML training and lays groundwork for applying hybrid strategies to other matrix-intensive ML operations, illustrating how quantum advantages can be realized in the near term through selective offloading rather than wholesale replacement of classical methods.
Abstract
Optimization techniques in deep learning are predominantly led by first-order gradient methodologies, such as SGD. However, neural network training can greatly benefit from the rapid convergence characteristics of second-order optimization. Newton's GD stands out in this category, by rescaling the gradient using the inverse Hessian. Nevertheless, one of its major bottlenecks is matrix inversion, which is notably time-consuming in $O(N^3)$ time with weak scalability. Matrix inversion can be translated into solving a series of linear equations. Given that quantum linear solver algorithms (QLSAs), leveraging the principles of quantum superposition and entanglement, can operate within a $\text{polylog}(N)$ time frame, they present a promising approach with exponential acceleration. Specifically, one of the most recent QLSAs demonstrates a complexity scaling of $O(d\cdotκ\log(N\cdotκ/ε))$, depending on: {size~$N$, condition number~$κ$, error tolerance~$ε$, quantum oracle sparsity~$d$} of the matrix. However, this also implies that their potential exponential advantage may be hindered by certain properties (i.e. $κ$ and $d$). We propose Q-Newton, a hybrid quantum-classical scheduler for accelerating neural network training with Newton's GD. Q-Newton utilizes a streamlined scheduling module that coordinates between quantum and classical linear solvers, by estimating & reducing $κ$ and constructing $d$ for the quantum solver. Our evaluation showcases the potential for Q-Newton to significantly reduce the total training time compared to commonly used optimizers like SGD. We hypothesize a future scenario where the gate time of quantum machines is reduced, possibly realized by attoseconds physics. Our evaluation establishes an ambitious and promising target for the evolution of quantum computing.
