Unlearnable phases of matter

Tarun Advaith Kumar; Yijian Zou; Amir-Reza Negari; Roger G. Melko; Timothy H. Hsieh

Unlearnable phases of matter

Tarun Advaith Kumar, Yijian Zou, Amir-Reza Negari, Roger G. Melko, Timothy H. Hsieh

Abstract

We identify fundamental limitations in machine learning by demonstrating that non-trivial mixed-state phases of matter are computationally hard to learn. Focusing on unsupervised learning of distributions, we show that autoregressive neural networks fail to learn global properties of distributions characterized by locally indistinguishable (LI) states. We demonstrate that conditional mutual information (CMI) is a useful diagnostic for LI: we show that for classical distributions, long-range CMI of a state implies a spatially LI partner. By introducing a restricted statistical query model, we prove that nontrivial phases with long-range CMI, such as strong-to-weak spontaneous symmetry breaking phases, are hard to learn. We validate our claims by using recurrent, convolutional, and Transformer neural networks to learn the syndrome and physical distributions of toric/surface code under bit flip noise. Our findings suggest hardness of learning as a diagnostic tool for detecting mixed-state phases and transitions and error-correction thresholds, and they suggest CMI and more generally ``non-local Gibbsness'' as metrics for how hard a distribution is to learn.

Unlearnable phases of matter

Abstract

Paper Structure (33 sections, 8 theorems, 105 equations, 9 figures)

This paper contains 33 sections, 8 theorems, 105 equations, 9 figures.

Details of Numerical Experiments
Network Architectures
Training Data Generation
Optimization Details
Evaluation Metrics
Implementation Details
Additional Numerical Results
Syndrome distribution of repetition code
Ferromagnetic ground states
Unlearnability of locally indistinguishable states
Information-theoretic hardness
Hardness of noisy gradient descent training
Stability of local indistinguishability
Local SQ learning of finite Markov length states
Distributions on 1D lattice
...and 18 more sections

Key Result

Theorem 1

Let $p$ be a distribution of $n$ spins on 1D with finite Markov length $\xi$, i.e., $I(A:C|B)\leq \mathrm{poly}(|A|,|C|) e^{-|B|/\xi}$ for any contiguous intervals $A,B,C$ (see the partition (a) in Fig. fig:partitions). Let $x>2$ be a positive constant, then there exists a $O(\log n)$-spatially-loca

Figures (9)

Figure 1: Unsupervised learning task and local indistinguishability obstruction. Given samples coming from a black box distribution $p(x)$, our learning problem is to reconstruct the distribution using an autoregressive neural network and unsupervised learning. Training with samples from locally indistinguishable distributions results in exponentially close training trajectories.
Figure 2: (left) For toric code with bit-flip error on each edge qubit, the relevant syndrome checks and logical operators are shown. Each bit-flip flips the two syndromes on the nearest vertices. (right) For surface code (open boundary conditions) with bit-flip error, the relevant syndrome checks and logical operators are shown.
Figure 3: Learning the syndrome distribution of noisy toric code. (top) KL divergence between true distribution and trained 2D CNN with residual connections for the toric code vertex syndromes from bit flip error rate $p_{err}$ on 7 by 7 lattice (of syndromes bits). For $p_{err} > 0.109$, the syndrome distribution is in a SWSSB phase and is hard to learn. Specifically, (bottom) in the SWSSB phase, the neural network fails to learn the global parity of the true distribution.
Figure 4: Learning a noisy loop ensemble. (top) Error in CNN learning of the classical loop ensemble subject to bit-flip error rate $p_{err}$. We expect that with larger system size and CNN depth, the location of the peak drifts to the critical point $p_c$ at which CMI has power-law decay. The results are from a residually connected CNN architecture with 6, 8, and 10 layers for $L = 4, 6, 8$ respectively. (bottom) By subtracting off the KL divergence of learning the loop ensemble with no logical information from that of learning the ensemble with fixed logical, we observe that the additional difficulty of learning a fixed logical sector approaches $\log2$ in the nontrivial (error-correctable) phase. This subfigure uses a vanilla 3-layer CNN.
Figure 5: Performance metrics for 1d syndrome distribution. KL divergence between trained RNN and exact probabilities of 64 site 1D repetition code syndromes with MPDOs.
...and 4 more figures

Theorems & Definitions (18)

Theorem 1
proof
Lemma 2
proof
Lemma 3
proof
Corollary 4
proof
Theorem 5
proof
...and 8 more

Unlearnable phases of matter

Abstract

Unlearnable phases of matter

Authors

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (18)