Table of Contents
Fetching ...

Accelerating Inference for Multilayer Neural Networks with Quantum Computers

Arthur G. Rattew, Po-Wei Huang, Naixu Guo, Lirandë Pira, Patrick Rebentrost

TL;DR

This work presents the first fully coherent quantum implementation of a multilayer neural network with non-linear activations, modeled on ResNet-like architectures, and develops novel quantum primitives for coherent matrix-vector arithmetic. Central contributions include a modular vector-encoding framework, new VE operations, a matrix-vector squared product without Frobenius dependence, a QRAM-free block-encoding for 2D multi-filter convolutions, and two architectural blocks that ensure norm preservation across residual layers. The authors prove end-to-end inference complexities under three QRAM regimes, achieving polylogarithmic, quartic, and quadratic speedups relative to exact classical inference for different data-access assumptions, thereby outlining a path toward quantum acceleration of deep learning inference. These results offer a principled approach to quantum-accelerated inference with practical considerations for data access, coherence, and norm stability, and they open avenues for further exploration of QRAM feasibility and dequantization boundaries in quantum machine learning.

Abstract

Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with multi-filter 2D convolutions, sigmoid activations, skip-connections, and layer normalizations. We analyse the complexity of inference for networks under three quantum data access regimes. Without any assumptions, we establish a quadratic speedup over classical methods for shallow bilinear-style networks. With efficient quantum access to the weights, we obtain a quartic speedup over classical methods. With efficient quantum access to both the inputs and the network weights, we prove that a network with an $N$-dimensional vectorized input, $k$ residual block layers, and a final residual-linear-pooling layer can be implemented with an error of $ε$ with $O(\text{polylog}(N/ε)^k)$ inference cost.

Accelerating Inference for Multilayer Neural Networks with Quantum Computers

TL;DR

This work presents the first fully coherent quantum implementation of a multilayer neural network with non-linear activations, modeled on ResNet-like architectures, and develops novel quantum primitives for coherent matrix-vector arithmetic. Central contributions include a modular vector-encoding framework, new VE operations, a matrix-vector squared product without Frobenius dependence, a QRAM-free block-encoding for 2D multi-filter convolutions, and two architectural blocks that ensure norm preservation across residual layers. The authors prove end-to-end inference complexities under three QRAM regimes, achieving polylogarithmic, quartic, and quadratic speedups relative to exact classical inference for different data-access assumptions, thereby outlining a path toward quantum acceleration of deep learning inference. These results offer a principled approach to quantum-accelerated inference with practical considerations for data access, coherence, and norm stability, and they open avenues for further exploration of QRAM feasibility and dequantization boundaries in quantum machine learning.

Abstract

Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with multi-filter 2D convolutions, sigmoid activations, skip-connections, and layer normalizations. We analyse the complexity of inference for networks under three quantum data access regimes. Without any assumptions, we establish a quadratic speedup over classical methods for shallow bilinear-style networks. With efficient quantum access to the weights, we obtain a quartic speedup over classical methods. With efficient quantum access to both the inputs and the network weights, we prove that a network with an -dimensional vectorized input, residual block layers, and a final residual-linear-pooling layer can be implemented with an error of with inference cost.

Paper Structure

This paper contains 34 sections, 33 theorems, 63 equations, 4 figures, 1 table.

Key Result

Lemma 1

Let $0 \le \tau \le 1$. We are given unitary circuits $U_{\psi}$ and $U_{\phi}$ which are $(\alpha, a, \epsilon_0)$ and $(\beta, b, \epsilon_1)$ VEs for $|\psi\rangle_n$ and $|\phi\rangle_n$, respectively. Define $c := \max(a, b)$, $|\Gamma\rangle_n := \frac{\tau}{\alpha} |\psi\rangle_n + \frac{(1-\

Figures (4)

  • Figure 1: Architecture for Convolutional Neural Networks. This figure shows the architectures we consider with provable quantum complexity guarantees for inference under three regimes of quantum data access assumptions. (a) Depicts the architecture where both the inputs and network weights are provided in an efficient quantum data structure. (b) Only the network weights are provided in an efficient quantum data structure. (c) No input assumptions are made. In all architectures, the input is assumed to be a rank-3 tensor (e.g., images with 4 channels).
  • Figure 2: Generic Residual Architectural Block. This diagram illustrates the structure of a typical residual block used in deep neural networks. The input vector $\boldsymbol{x}$ is transformed through a sequence of operations: a learnable linear transformation $W$, a non-linear activation function $f$, and a residual (skip) connection that adds the original input to the transformed signal. The output is then passed through a normalization layer (norm).
  • Figure 3: Circuit for addition of VE encoded vectors. Given two unitary matrices, $U_{\psi}$ which is a $(\alpha, a, \epsilon_0)$-VE for the $n$-qubit state $|\psi\rangle$, and $U_{\phi}$ which is a $(\beta, b, \epsilon_1)$-VE for the $n$-qubit state $|\phi\rangle$, define $c := \max(a, b)$. We define $\tilde{U}_{\psi}$ by appropriately tensoring $U_{\psi}$ with $I_{c-a}$ and we define $\tilde{U}_{\phi}$ by appropriately tensoring $U_{\phi}$ with $I_{c-b}$, such that $\tilde{U}_{\psi}$ and $\tilde{U}_{\phi}$ both act on $n + c$ qubits. Then, the given circuit yields a VE of the sum of the encoded vectors, as shown in \ref{['adl:lemma:vector_sum']}.
  • Figure 4: Full-rank linear-pooling output block.

Theorems & Definitions (75)

  • Definition 1: The Approximate Sampling-Based Classification Problem
  • Definition 2: Block encoding gilyen2019quantum
  • Definition 3: Vector-Encoding (VE) rattew2023non
  • Lemma 1: Vector Sum, Proof in \ref{['proof:lemma:vector_sum']}
  • Lemma 2: Matrix-Vector Product, Proof in \ref{['proof:matrix_vector_product']}
  • Lemma 3: Tensor Product of Vector Encodings, Proof in \ref{['proof:lemma:tensor_product_of_VEs']}
  • Lemma 4: Concatenation of Vector Encodings, Proof in \ref{['proof:lemma:concatenation_of_vector_encodings']}
  • Theorem 1: Product of Arbitrary Matrix with a Vector Element-wise Squared, Informal
  • Lemma 5: QRAM-Free Block-Encoding of 2D Convolution With Filters
  • Lemma 6: General Skip Norm Block
  • ...and 65 more