Table of Contents
Fetching ...

Toward Trainability of Quantum Neural Networks

Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao

TL;DR

The paper tackles the critical problem of training quantum neural networks on near-term devices by addressing barren plateaus. It introduces two structured QNN architectures, TT-QNNs and SC-QNNs, and proves that their expected gradient norms vanish at most polylogarithmically or polynomially in the qubit count, rather than exponentially, thereby guaranteeing improved trainability. A variational input-model framework demonstrates how to prepare amplitude-encoded inputs with a depth-controlled encoder, ensuring gradient bounds that are robust to the input state. Empirical results on MNIST-derived binary classification tasks show TT-QNNs and SC-QNNs training faster and achieving higher accuracy than random-structure QNNs, underscoring the practical relevance for near-term quantum computing. Overall, the work advances trainability guarantees for QNNs without relying on unitary 2-design assumptions and highlights architecture-aware strategies for scalable quantum learning.

Abstract

Quantum Neural Networks (QNNs) have been recently proposed as generalizations of classical neural networks to achieve the quantum speed-up. Despite the potential to outperform classical models, serious bottlenecks exist for training QNNs; namely, QNNs with random structures have poor trainability due to the vanishing gradient with rate exponential to the input qubit number. The vanishing gradient could seriously influence the applications of large-size QNNs. In this work, we provide a viable solution with theoretical guarantees. Specifically, we prove that QNNs with tree tensor and step controlled architectures have gradients that vanish at most polynomially with the qubit number. We numerically demonstrate QNNs with tree tensor and step controlled structures for the application of binary classification. Simulations show faster convergent rates and better accuracy compared to QNNs with random structures.

Toward Trainability of Quantum Neural Networks

TL;DR

The paper tackles the critical problem of training quantum neural networks on near-term devices by addressing barren plateaus. It introduces two structured QNN architectures, TT-QNNs and SC-QNNs, and proves that their expected gradient norms vanish at most polylogarithmically or polynomially in the qubit count, rather than exponentially, thereby guaranteeing improved trainability. A variational input-model framework demonstrates how to prepare amplitude-encoded inputs with a depth-controlled encoder, ensuring gradient bounds that are robust to the input state. Empirical results on MNIST-derived binary classification tasks show TT-QNNs and SC-QNNs training faster and achieving higher accuracy than random-structure QNNs, underscoring the practical relevance for near-term quantum computing. Overall, the work advances trainability guarantees for QNNs without relying on unitary 2-design assumptions and highlights architecture-aware strategies for scalable quantum learning.

Abstract

Quantum Neural Networks (QNNs) have been recently proposed as generalizations of classical neural networks to achieve the quantum speed-up. Despite the potential to outperform classical models, serious bottlenecks exist for training QNNs; namely, QNNs with random structures have poor trainability due to the vanishing gradient with rate exponential to the input qubit number. The vanishing gradient could seriously influence the applications of large-size QNNs. In this work, we provide a viable solution with theoretical guarantees. Specifically, we prove that QNNs with tree tensor and step controlled architectures have gradients that vanish at most polynomially with the qubit number. We numerically demonstrate QNNs with tree tensor and step controlled structures for the application of binary classification. Simulations show faster convergent rates and better accuracy compared to QNNs with random structures.

Paper Structure

This paper contains 23 sections, 17 theorems, 114 equations, 11 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1.1

(Informal) Consider the $n$-qubit TT-QNN and the $n$-qubit SC-QNN defined in Figure tqnn_ttn_circuit-tqnn_sc_circuit and corresponding objective functions $f_{\text{TT}}$ and $f_{\text{SC}}$ defined in (tqnn_ttn_loss-tqnn_sc_loss), then we have: where $n_c$ is the number of CNOT operations that directly link to the first qubit channel in the SC-QNN, the expectation is taken for all parameters in

Figures (11)

  • Figure 1: Quantum Neural Network with the Tree Tensor structure ($n=4$).
  • Figure 2: Quantum Neural Network with the Step Controlled structure ($n=4$, $n_c = 2$).
  • Figure 3: The parameterized alternating layered circuit $W(\bm{\beta})$ ($n=8$, $L=1$) for training the corresponding encoding circuit of the input $\bm{x}_{\text{in}}$.
  • Figure 4: The encoding circuit $U(\bm{\beta}^{*})$, where $W(\bm{\beta}^{*})$ is the trained parameterized circuit in Figure \ref{['tqnn_input_model_circuit_train']}.
  • Figure 5: Simulations on the MNIST binary classification between $(0,2)$. The training loss and the test error during the training iteration are illustrated in Figures \ref{['tqnn_02_figure_loss_8']}, \ref{['tqnn_02_figure_error_8']} for the n=8 case, Figures \ref{['tqnn_02_figure_loss_10']}, \ref{['tqnn_02_figure_error_10']} for the n=10 case, and Figures \ref{['tqnn_02_figure_loss_12']}, \ref{['tqnn_02_figure_error_12']} for the n=12 case. The gradient norm of objective functions and the term $\alpha(\rho_{\text{in}})$ during the training are shown in Figures \ref{['tqnn_02_figure_gradnorm_8']} and \ref{['tqnn_02_figure_alpha_8']}, respectively for the n=8 case.
  • ...and 6 more figures

Theorems & Definitions (32)

  • Theorem 1.1
  • Theorem 3.1
  • Theorem 3.2
  • Lemma C.1
  • proof
  • Lemma C.2
  • proof
  • Lemma C.3
  • proof
  • Lemma C.4
  • ...and 22 more