Accelerating Inference for Multilayer Neural Networks with Quantum Computers
Arthur G. Rattew, Po-Wei Huang, Naixu Guo, Lirandë Pira, Patrick Rebentrost
TL;DR
This work presents the first fully coherent quantum implementation of a multilayer neural network with non-linear activations, modeled on ResNet-like architectures, and develops novel quantum primitives for coherent matrix-vector arithmetic. Central contributions include a modular vector-encoding framework, new VE operations, a matrix-vector squared product without Frobenius dependence, a QRAM-free block-encoding for 2D multi-filter convolutions, and two architectural blocks that ensure norm preservation across residual layers. The authors prove end-to-end inference complexities under three QRAM regimes, achieving polylogarithmic, quartic, and quadratic speedups relative to exact classical inference for different data-access assumptions, thereby outlining a path toward quantum acceleration of deep learning inference. These results offer a principled approach to quantum-accelerated inference with practical considerations for data access, coherence, and norm stability, and they open avenues for further exploration of QRAM feasibility and dequantization boundaries in quantum machine learning.
Abstract
Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with multi-filter 2D convolutions, sigmoid activations, skip-connections, and layer normalizations. We analyse the complexity of inference for networks under three quantum data access regimes. Without any assumptions, we establish a quadratic speedup over classical methods for shallow bilinear-style networks. With efficient quantum access to the weights, we obtain a quartic speedup over classical methods. With efficient quantum access to both the inputs and the network weights, we prove that a network with an $N$-dimensional vectorized input, $k$ residual block layers, and a final residual-linear-pooling layer can be implemented with an error of $ε$ with $O(\text{polylog}(N/ε)^k)$ inference cost.
