Table of Contents
Fetching ...

Universal scaling laws in quantum-probabilistic machine learning by tensor network towards interpreting representation and generalization powers

Sheng-Chen Bai, Shi-Ju Ran

TL;DR

This study reveals that, while gaining information through training, the linear-scaling law is suppressed by a negative quadratic correction, leading to L ≃ βM – αM2 + const.

Abstract

Interpreting the representation and generalization powers has been a long-standing issue in the field of machine learning (ML) and artificial intelligence. This work contributes to uncovering the emergence of universal scaling laws in quantum-probabilistic ML. We take the generative tensor network (GTN) in the form of a matrix product state as an example and show that with an untrained GTN (such as a random TN state), the negative logarithmic likelihood (NLL) $L$ generally increases linearly with the number of features $M$, i.e., $L \simeq k M + const$. This is a consequence of the so-called ``catastrophe of orthogonality,'' which states that quantum many-body states tend to become exponentially orthogonal to each other as $M$ increases. We reveal that while gaining information through training, the linear scaling law is suppressed by a negative quadratic correction, leading to $L \simeq βM - αM^2 + const$. The scaling coefficients exhibit logarithmic relationships with the number of training samples and the number of quantum channels $χ$. The emergence of the quadratic correction term in NLL for the testing (training) set can be regarded as evidence of the generalization (representation) power of GTN. Over-parameterization can be identified by the deviation in the values of $α$ between training and testing sets while increasing $χ$. We further investigate how orthogonality in the quantum feature map relates to the satisfaction of quantum probabilistic interpretation, as well as to the representation and generalization powers of GTN. The unveiling of universal scaling laws in quantum-probabilistic ML would be a valuable step toward establishing a white-box ML scheme interpreted within the quantum probabilistic framework.

Universal scaling laws in quantum-probabilistic machine learning by tensor network towards interpreting representation and generalization powers

TL;DR

This study reveals that, while gaining information through training, the linear-scaling law is suppressed by a negative quadratic correction, leading to L ≃ βM – αM2 + const.

Abstract

Interpreting the representation and generalization powers has been a long-standing issue in the field of machine learning (ML) and artificial intelligence. This work contributes to uncovering the emergence of universal scaling laws in quantum-probabilistic ML. We take the generative tensor network (GTN) in the form of a matrix product state as an example and show that with an untrained GTN (such as a random TN state), the negative logarithmic likelihood (NLL) generally increases linearly with the number of features , i.e., . This is a consequence of the so-called ``catastrophe of orthogonality,'' which states that quantum many-body states tend to become exponentially orthogonal to each other as increases. We reveal that while gaining information through training, the linear scaling law is suppressed by a negative quadratic correction, leading to . The scaling coefficients exhibit logarithmic relationships with the number of training samples and the number of quantum channels . The emergence of the quadratic correction term in NLL for the testing (training) set can be regarded as evidence of the generalization (representation) power of GTN. Over-parameterization can be identified by the deviation in the values of between training and testing sets while increasing . We further investigate how orthogonality in the quantum feature map relates to the satisfaction of quantum probabilistic interpretation, as well as to the representation and generalization powers of GTN. The unveiling of universal scaling laws in quantum-probabilistic ML would be a valuable step toward establishing a white-box ML scheme interpreted within the quantum probabilistic framework.

Paper Structure

This paper contains 3 sections, 15 equations, 9 figures.

Figures (9)

  • Figure 1: (Color online) The illustration of how the gain of information alters the scaling law of NLL in quantum-probabilistic ML. By training the GTN, the linear scaling law with respect to the feature number $M$ [Eq. (\ref{['eq-linear']})], which is the result of the "orthogonal catastrophe" of quantum many-body states, is suppressed by the addition of a negative quadratic correction term [Eq. (\ref{['eq-corrected']})].
  • Figure 2: (Color online) (a) The linear scaling of the "inter-class" NLL [Eq. (\ref{['eq-linear']})], where the samples and GTN's correspond to different categories while computing NLL. (b) The emergence of the negative quadratic correction in the scaling of the "intra-class" NLL [Eq. (\ref{['eq-corrected']})], where the samples and GTN's correspond to the same category. Here, we choose the Fashion-MNIST dataset. The data for $M<784$ are obtained by cropping the middle section of the images.
  • Figure 3: (Color online) (a)-(c) The coefficients in the corrected scaling laws [$\alpha$, $\beta$, and $\gamma$ in Eq. (\ref{['eq-corrected']})] of the intra-class NLL and their logarithmic scaling against the virtual dimension $\chi$. (c) The intra-class NLL $L$ and its logarithmic scaling against $\chi$. The deviations between the curves of the training and testing sets are observed, which indicate over-parameterization. We take the Fashion-MNIST dataset and fix $M=784$. The fittings (red solid lines) are given based on the training data.
  • Figure 4: (Color online) Comparisons of (a) $p_L$ and (b) $q_L$ between the values obtained by the fitting with Eq. (\ref{['eq-Llog']}) and those computed using Eq. (\ref{['eq-pq']}).
  • Figure 5: (Color online) (a) The NLL $L$ and (b) average probability per sample $P$ for the training and testing sets of binarized Fashion-MNIST versus $\theta$ in the QFM [Eq. (\ref{['eq-QFM']})]. We fix $M=64$. (c) and (d) show $L$ and $P$ versus $M$ for the original (gray-scale) dataset with $\theta=1$. In the insets of (b) and (d), we show the classification accuracy of the GTN classifier against $\theta$ and $M$, respectively. The error bars indicate the standard deviations over 5 independent simulations. We take $N = \chi = 128$.
  • ...and 4 more figures