Toward Trainability of Quantum Neural Networks
Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao
TL;DR
The paper tackles the critical problem of training quantum neural networks on near-term devices by addressing barren plateaus. It introduces two structured QNN architectures, TT-QNNs and SC-QNNs, and proves that their expected gradient norms vanish at most polylogarithmically or polynomially in the qubit count, rather than exponentially, thereby guaranteeing improved trainability. A variational input-model framework demonstrates how to prepare amplitude-encoded inputs with a depth-controlled encoder, ensuring gradient bounds that are robust to the input state. Empirical results on MNIST-derived binary classification tasks show TT-QNNs and SC-QNNs training faster and achieving higher accuracy than random-structure QNNs, underscoring the practical relevance for near-term quantum computing. Overall, the work advances trainability guarantees for QNNs without relying on unitary 2-design assumptions and highlights architecture-aware strategies for scalable quantum learning.
Abstract
Quantum Neural Networks (QNNs) have been recently proposed as generalizations of classical neural networks to achieve the quantum speed-up. Despite the potential to outperform classical models, serious bottlenecks exist for training QNNs; namely, QNNs with random structures have poor trainability due to the vanishing gradient with rate exponential to the input qubit number. The vanishing gradient could seriously influence the applications of large-size QNNs. In this work, we provide a viable solution with theoretical guarantees. Specifically, we prove that QNNs with tree tensor and step controlled architectures have gradients that vanish at most polynomially with the qubit number. We numerically demonstrate QNNs with tree tensor and step controlled structures for the application of binary classification. Simulations show faster convergent rates and better accuracy compared to QNNs with random structures.
