Table of Contents
Fetching ...

Arbitrary Polynomial Separations in Trainable Quantum Machine Learning

Eric R. Anschuetz, Xun Gao

TL;DR

The paper tackles the challenge of achieving large quantum advantages in machine learning without sacrificing trainability by proposing k-hypergraph recurrent neural networks (k-HRNNs) that are efficiently trainable and remain expressive. It constructs both continuous-variable (qumode) and qubit versions, embedding the network dynamics into low-dimensional Lie subgroups to avoid quantum trainability barriers, while leveraging hypergraph-stabilizer structures to realize powerful quantum contextuality-based expressivity. The authors formalize a sequence-modeling task, (ell,n,k)-HSMT, and prove that classical networks require memory scaling as $inom{n}{k}-1$ to approximate the task, whereas k-HRNNs can perform it with zero error, yielding an arbitrary polynomial memory separation in $n$ and $k$ (e.g., for $k=n/2$ the separation is exponential). They discuss the implications for inference-time advantages, potential hardware platforms, and future directions for exploiting semantic ambiguity and contextuality in real-world data, suggesting a natural setting where quantum learning can outperform classical counterparts during deployment.

Abstract

Recent theoretical results in quantum machine learning have demonstrated a general trade-off between the expressive power of quantum neural networks (QNNs) and their trainability; as a corollary of these results, practical exponential separations in expressive power over classical machine learning models are believed to be infeasible as such QNNs take a time to train that is exponential in the model size. We here circumvent these negative results by constructing a hierarchy of efficiently trainable QNNs that exhibit unconditionally provable, polynomial memory separations of arbitrary constant degree over classical neural networks -- including state-of-the-art models, such as Transformers -- in performing a classical sequence modeling task. This construction is also computationally efficient, as each unit cell of the introduced class of QNNs only has constant gate complexity. We show that contextuality -- informally, a quantitative notion of semantic ambiguity -- is the source of the expressivity separation, suggesting that other learning tasks with this property may be a natural setting for the use of quantum learning algorithms.

Arbitrary Polynomial Separations in Trainable Quantum Machine Learning

TL;DR

The paper tackles the challenge of achieving large quantum advantages in machine learning without sacrificing trainability by proposing k-hypergraph recurrent neural networks (k-HRNNs) that are efficiently trainable and remain expressive. It constructs both continuous-variable (qumode) and qubit versions, embedding the network dynamics into low-dimensional Lie subgroups to avoid quantum trainability barriers, while leveraging hypergraph-stabilizer structures to realize powerful quantum contextuality-based expressivity. The authors formalize a sequence-modeling task, (ell,n,k)-HSMT, and prove that classical networks require memory scaling as to approximate the task, whereas k-HRNNs can perform it with zero error, yielding an arbitrary polynomial memory separation in and (e.g., for the separation is exponential). They discuss the implications for inference-time advantages, potential hardware platforms, and future directions for exploiting semantic ambiguity and contextuality in real-world data, suggesting a natural setting where quantum learning can outperform classical counterparts during deployment.

Abstract

Recent theoretical results in quantum machine learning have demonstrated a general trade-off between the expressive power of quantum neural networks (QNNs) and their trainability; as a corollary of these results, practical exponential separations in expressive power over classical machine learning models are believed to be infeasible as such QNNs take a time to train that is exponential in the model size. We here circumvent these negative results by constructing a hierarchy of efficiently trainable QNNs that exhibit unconditionally provable, polynomial memory separations of arbitrary constant degree over classical neural networks -- including state-of-the-art models, such as Transformers -- in performing a classical sequence modeling task. This construction is also computationally efficient, as each unit cell of the introduced class of QNNs only has constant gate complexity. We show that contextuality -- informally, a quantitative notion of semantic ambiguity -- is the source of the expressivity separation, suggesting that other learning tasks with this property may be a natural setting for the use of quantum learning algorithms.
Paper Structure (25 sections, 7 theorems, 87 equations, 3 figures, 1 table)

This paper contains 25 sections, 7 theorems, 87 equations, 3 figures, 1 table.

Key Result

Proposition 4.1

Let $p\left(\bm{y}\mid\bm{x}\right)$ be the distribution corresponding to the $\left(\ell,n,k\right)$-HSMT task for some $\ell\geq\binom{n}{k}+n$. There exists a $\binom{n}{k}-1$-dimensional subspace $\mathcal{X}$ of inputs where the following holds: consider any triplet of distinct $\binom{n}{k}$-t Furthermore, this property is due to contextuality.

Figures (3)

  • Figure 1: (a) An autoregressive neural sequence model. The model autoregressively takes input tokens $\bm{x_i}$, and outputs decoded tokens $\bm{y_i}$, with map $\mathcal{F}_i$. The model also has an unobserved internal memory with state $\bm{\lambda_i}\in L$ after decoding the token $\bm{x_{i-1}}$ that $\mathcal{F}_i$ can depend on. (b) A general encoder-decoder model. $\mathcal{E}$ encodes the input $\bm{x}$ into some latent representation $\bm{\lambda}\in L$. A decoder $\mathcal{D}$ then outputs the decoded sequence $\bm{y}$.
  • Figure 2: A $k$-hypergraph recurrent neural network ($k$-HRNN) unit cell as described in Sec. \ref{['sec:hrnn']}, with Greek letters denoting trained classical networks and $G_\cdot$ as in Eq. \ref{['eq:g_def']}. $\bm{\alpha}$ and $\bm{\beta}$ output $k$-tuples of qumodes on which the associated operators have support. $\gamma$ and $\bm{\kappa}$ output the associated rotation angles. The ancillary $\ket{a}^{\otimes m}$ and $\text{QFT}$ perform phase estimation of the operators $G_\cdot$ as described in Sec. \ref{['sec:hrnn']}. Measurement is either in the position basis (in the qumode setting) or in the computational basis (in the qubit setting).
  • Figure 3: A representation of the noninjectivity of a classical network performing $\left(\ell,n,k\right)$-HSMT as described in Proposition \ref{['prop:fiber_bundle_struct']}. On some finite-volume subspace of inputs, any classical network noninjectively maps all points on a red line (a "fiber") to the same point in latent space, isomorphic to the gray parallelogram (the "base space"). $\bm{x},\bm{x'},\bm{x"}\in\mathcal{X}$ label input sequences as described in Proposition \ref{['prop:cont_meas_seqs']}. Due to the noninjectivity of the network, $\bm{x}$, $\bm{x'}$, and $\bm{x"}$ are indistinguishable to the classical neural network.

Theorems & Definitions (12)

  • Proposition 4.1
  • Proposition 4.2
  • Theorem 4.3: Memory lower bound for $\left(\ell,n,k\right)$-HSMT, informal
  • proof : Proof sketch
  • Lemma B.1: Qubit $k$-uniform hypergraph states are antidistinguishable
  • proof
  • Lemma B.2: Qumode $k$-uniform hypergraph states are antidistinguishable
  • proof
  • Theorem B.3: Autoregressive hypergraph stabilizer measurement translation memory lower bound
  • proof
  • ...and 2 more