Table of Contents
Fetching ...

The Spectral Amplitude Principle for Dynamics of Quantum Neural Networks

Yi-hang Xu, Dan-Bo Zhang, Junchi Yan

TL;DR

This work identifies a new training-dynamics mechanism for Quantum Neural Networks, termed the Spectral Amplitude Priority, where optimization emphasizes spectral components by their amplitude rather than their frequency index, in contrast to classical spectral bias. By formulating a gradient analysis in the frequency domain and recasting training dynamics with the Quantum Neural Tangent Kernel, the authors prove that high-amplitude spectral components decay rapidly under small learning rates, enabling QNNs to learn high-frequency content efficiently. Empirical validation across synthetic high-frequency functions, classification benchmarks, and quantum-advantage tasks demonstrates that QNNs outperform classical DNNs on high-frequency tasks and maintain robustness to Fourier feature hyperparameters. The findings provide a rigorous explanation for QNN expressivity in complex spectral landscapes and suggest practical implications for leveraging amplitude-rich spectra in quantum-enhanced learning, while acknowledging the limitations of idealized simulations and hardware noise considerations in future work.

Abstract

The mechanism governing the training dynamics of Quantum Neural Networks (QNNs) remains under-explored. In classical Deep Neural Networks (DNNs), training is dominated by "Spectral Bias," i.e. prioritizing learning low-frequency components and struggle for high-frequency details. In this work, we theoretically and empirically identify a distinct mechanism in QNNs, which we term Spectral Amplitude Priority. By analyzing the frequency-domain gradients and residual dynamics via the Quantum Neural Tangent Kernel (QNTK), we prove that QNN training is governed primarily by the magnitude of spectral components rather than their frequency indices. Consequently, QNNs can efficiently capture high-frequency functions-provided they have significant amplitude-thereby overcoming the inherent limitations of their classical counterparts. We validate this principle on both synthetic high-frequency functions and quantum-advantage tasks. The results show that QNNs significantly outperform DNNs in high-frequency tasks, offering an explanation for QNNs' superior expressivity in complex spectral landscapes.

The Spectral Amplitude Principle for Dynamics of Quantum Neural Networks

TL;DR

This work identifies a new training-dynamics mechanism for Quantum Neural Networks, termed the Spectral Amplitude Priority, where optimization emphasizes spectral components by their amplitude rather than their frequency index, in contrast to classical spectral bias. By formulating a gradient analysis in the frequency domain and recasting training dynamics with the Quantum Neural Tangent Kernel, the authors prove that high-amplitude spectral components decay rapidly under small learning rates, enabling QNNs to learn high-frequency content efficiently. Empirical validation across synthetic high-frequency functions, classification benchmarks, and quantum-advantage tasks demonstrates that QNNs outperform classical DNNs on high-frequency tasks and maintain robustness to Fourier feature hyperparameters. The findings provide a rigorous explanation for QNN expressivity in complex spectral landscapes and suggest practical implications for leveraging amplitude-rich spectra in quantum-enhanced learning, while acknowledging the limitations of idealized simulations and hardware noise considerations in future work.

Abstract

The mechanism governing the training dynamics of Quantum Neural Networks (QNNs) remains under-explored. In classical Deep Neural Networks (DNNs), training is dominated by "Spectral Bias," i.e. prioritizing learning low-frequency components and struggle for high-frequency details. In this work, we theoretically and empirically identify a distinct mechanism in QNNs, which we term Spectral Amplitude Priority. By analyzing the frequency-domain gradients and residual dynamics via the Quantum Neural Tangent Kernel (QNTK), we prove that QNN training is governed primarily by the magnitude of spectral components rather than their frequency indices. Consequently, QNNs can efficiently capture high-frequency functions-provided they have significant amplitude-thereby overcoming the inherent limitations of their classical counterparts. We validate this principle on both synthetic high-frequency functions and quantum-advantage tasks. The results show that QNNs significantly outperform DNNs in high-frequency tasks, offering an explanation for QNNs' superior expressivity in complex spectral landscapes.
Paper Structure (17 sections, 13 equations, 11 figures, 3 tables)

This paper contains 17 sections, 13 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The scheme of QNNs studied in the paper. (a) $S(x)$ is the data encoding circuit block and $W_{\theta _{(p)}}$ is the trainable circuit block. (b) One of the layers of $S(x)$ and $W_{\theta_{(p)}}$ specific gates and their dependence on $x$, $\theta$. The encoding circuit $S(x)$ embeds classical data into the rotation angles of RX gates, thereby encoding classical information into quantum state.
  • Figure 2: The amplitude-frequency plots of the $f_L(x)$, $f_M(x)$, and $f_H(x)$ in Eq. \ref{['eq:curves']}. The x-axis represents the index of each frequency component after applying the Fourier transform to the function, while the y-axis denotes the amplitude. The black, red, and blue curves correspond to the frequency and amplitude distributions of $f_L(x)$, $f_M(x)$, and $f_H(x)$, respectively.
  • Figure 3: Evolutions of gradients derived from Eq. \ref{['eq:gradient_k']} for loss functions when training QNNs for fitting one-variable curves of low-frequency (a), middle-frequency (b) and high-frequency (c) dominated functions.
  • Figure 4: Comparison of residual dynamics between actual ones and predicted by QNTK when learning low-frequency (a), middle-frequency (b) and high-frequency (c) dominated functions. We plot the actual dynamics of $\varepsilon(t)$ for the above three functions, taking three peak frequencies in frequency domain corresponding to low, medium, and high frequencies, and their $\varepsilon(t)$ theoretical predictions of QNTK. The qubit number is $8$ with learning rate $\eta=0.01$. Parameters in the ansatz are initialized randomly in [$0$, $2\pi$].
  • Figure 5: Evolutions of relative errors for low-frequency (a), middle- (b), and high- (c) dominated functions over iterations.
  • ...and 6 more figures