Table of Contents
Fetching ...

Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks

Louis Schatzki, Martin Larocca, Quynh T. Nguyen, Frederic Sauvage, M. Cerezo

TL;DR

This work provides theoretical guarantees for equivariant QNNs, thus indicating the power and potential of GQML.

Abstract

Despite the great promise of quantum machine learning models, there are several challenges one must overcome before unlocking their full potential. For instance, models based on quantum neural networks (QNNs) can suffer from excessive local minima and barren plateaus in their training landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has emerged as a potential solution to some of those issues. The key insight of GQML is that one should design architectures, such as equivariant QNNs, encoding the symmetries of the problem at hand. Here, we focus on problems with permutation symmetry (i.e., the group of symmetry $S_n$), and show how to build $S_n$-equivariant QNNs. We provide an analytical study of their performance, proving that they do not suffer from barren plateaus, quickly reach overparametrization, and generalize well from small amounts of data. To verify our results, we perform numerical simulations for a graph state classification task. Our work provides the first theoretical guarantees for equivariant QNNs, thus indicating the extreme power and potential of GQML.

Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks

TL;DR

This work provides theoretical guarantees for equivariant QNNs, thus indicating the power and potential of GQML.

Abstract

Despite the great promise of quantum machine learning models, there are several challenges one must overcome before unlocking their full potential. For instance, models based on quantum neural networks (QNNs) can suffer from excessive local minima and barren plateaus in their training landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has emerged as a potential solution to some of those issues. The key insight of GQML is that one should design architectures, such as equivariant QNNs, encoding the symmetries of the problem at hand. Here, we focus on problems with permutation symmetry (i.e., the group of symmetry ), and show how to build -equivariant QNNs. We provide an analytical study of their performance, proving that they do not suffer from barren plateaus, quickly reach overparametrization, and generalize well from small amounts of data. To verify our results, we perform numerical simulations for a graph state classification task. Our work provides the first theoretical guarantees for equivariant QNNs, thus indicating the extreme power and potential of GQML.
Paper Structure (26 sections, 9 theorems, 36 equations, 5 figures, 1 table)

This paper contains 26 sections, 9 theorems, 36 equations, 5 figures, 1 table.

Key Result

Lemma 1

A loss function of the form in Eq. eq_loss is $G$-invariant if its composed of a $G$-equivariant QNN and measurement.

Figures (5)

  • Figure 1: GQML embeds geometric priors into a QML model. Incorporating prior knowledge through $S_n$-equivariance heavily restricts the search space of the model. We show that such inductive biases lead to models that do not exhibit barren plateaus, can be efficiently overparametrized, and require small amounts of data to generalizing well.
  • Figure 2: Quantum circuit for an $S_n$-equivariant QNN. Each layer of the QNN is obtained by exponentiation of a generator from the set $\mathcal{G}$ in Eq. \ref{['eq:generators-main']}. Here we show a circuit with $L=3$ layers acting on $n=4$ qubits. Single-qubit blocks indicate a rotation about the $x$ or $y$ axis, while two-qubit blocks denote entangling gates generated by a $ZZ$ interaction. All colored gates between dashed horizontal lines share the same trainable parameter $\theta_l$.
  • Figure 3: Representation theory and $S_n$-equivariance. Using tools from representation theory we find that the $S_n$-equivariant QNN $U(\boldsymbol{\uptheta})$ and the representation of the group elements $R(\pi)$ -for any $\pi\in S_n$- admit an irrep block decomposition as in Eq. \ref{['eq:commutator']} and Eq. \ref{['eq:Isotypic']}, respectively. The irreps can be labeled with a single parameter $\lambda=(n-m,m)$ where $m=0,1,\ldots,\lfloor\frac{n}{2}\rfloor$. For a system of $n=5$ qubits, we show in a) the block diagonal decomposition for $U(\boldsymbol{\uptheta})$ and in b) the decomposition of $R(\pi)$ as a representation of $S_5$. The dashed boxes denote the isotypic components labeled by $\lambda$. c) As $n$ increases, $U(\boldsymbol{\uptheta})$ has a block diagonal decomposition which contains polynomially large blocks repeated a (potentially) exponential number of times. In contrast, the block decomposition of $R(\pi)$ (for any $\pi\in S_n$) contains blocks that can be exponentially large but that are only repeated a polynomial number of times.
  • Figure 4: Tetrahedral numbers. a) The Tetrahedral numbers ${\rm Te}_n$ are obtained by counting how many spheres can be stacked in the configuration of a tetrahedron (triangular base pyramid) of height $n$. b) One can also compute ${\rm Te}_n$ as the sum of consecutive triangular numbers, which count how many objects (e.g., spheres) can be arranged in an equilateral triangle.
  • Figure 5: Task of distinguishing connected from disconnect graphs with an $S_n$-equivariant QNN. a) Variance of the loss function partial derivatives versus the number of qubits $n$ (in log-linear scale). The square blue line depicts the variance for inputs of the QNN drawn from a dataset composed of connected and disconnected graph states. To visualize how the data with different labels contributes to this variance, we also plot in green crosses (orange circles) the variances when the QNN is only fed connected (disconnected) graph states. b) In the left panel, we show representative results for the rank of the QFIM (defined in the main text) versus the number of layers $L$ for different number of qubits $n$. The critical value of layers at which this rank saturates, denoted $L_{ovp}$ (vertical dashed lines), corresponds to the onset of overparametrization. In the middle panel, we report the scaling of $L_{\text{ovp}}$ versus the number of qubits (log-linear scale). For each problem size, we present results for $10$ random input graph states and, as a comparison, also report the Tetrahedral numbers ${\rm Te}_{n+1}$ (solid line). In the right panel, we report the relative loss error of optimized QNNs at given number of layers $L$ (in log-linear scale). These are obtained for different system sizes, with the dashed vertical lines indicating the corresponding values of $L_{\text{ovp}}$. c) Normalized generalization error versus number of qubits $n$ (in log-linear scale) for different training dataset sizes $M$. Here, we consider an overparametrized QNN with $L={\rm Te}_{n+1}$.

Theorems & Definitions (16)

  • Definition 1: Label symmetries and $G$-invariance
  • Definition 2: Equivariance
  • Definition 3: Equivariant QNN
  • Lemma 1: Invariance from equivariance
  • Lemma 2: Dimension of $S_n$-equivariant unitaries
  • Theorem 1: Variance of partial derivatives
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Theorem 4
  • ...and 6 more