What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

Yotam Alexander; Nimrod De La Vega; Noam Razin; Nadav Cohen

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

Yotam Alexander, Nimrod De La Vega, Noam Razin, Nadav Cohen

TL;DR

This work tackles the fundamental question of what makes data distributions suitable for locally connected neural networks (LC-NNs) by introducing a physics-inspired framework that treats data as tensors and analyzes learnability through quantum entanglement (QE) under canonical feature partitions. It proves a necessary-and-sufficient condition: a LC-NN can achieve low population loss if and only if the data tensor exhibits low entanglement across all canonical partitions, with the entanglement bound tied to the network width $R$ via $QE\le\ln(R)$ up to small terms. The authors translate theory into practice by proposing a data-enhancement protocol that rearranges features to reduce entanglement, using a surrogate $SE$ based on multivariate Pearson correlations and minimum balanced cuts solvable by graph-partitioning algorithms; this approach yields substantial improvements across CNNs, S4, and local-attention models on audio, tabular, and image data. Overall, the work offers a principled, physics-grounded perspective on data conditioning and architecture-data co-design, with practical implications for improving LC-NN performance on natural data modalities.

Abstract

The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics. Our main theoretical result states that a certain locally connected neural network is capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement under certain canonical partitions of features. As a practical application of this result, we derive a preprocessing method for enhancing the suitability of a data distribution to locally connected neural networks. Experiments with widespread models over various datasets demonstrate our findings. We hope that our use of quantum entanglement will encourage further adoption of tools from physics for formally reasoning about the relation between deep learning and real-world data.

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

TL;DR

via

up to small terms. The authors translate theory into practice by proposing a data-enhancement protocol that rearranges features to reduce entanglement, using a surrogate

based on multivariate Pearson correlations and minimum balanced cuts solvable by graph-partitioning algorithms; this approach yields substantial improvements across CNNs, S4, and local-attention models on audio, tabular, and image data. Overall, the work offers a principled, physics-grounded perspective on data conditioning and architecture-data co-design, with practical implications for improving LC-NN performance on natural data modalities.

Abstract

Paper Structure (59 sections, 23 theorems, 125 equations, 8 figures, 12 tables, 3 algorithms)

This paper contains 59 sections, 23 theorems, 125 equations, 8 figures, 12 tables, 3 algorithms.

Introduction
Related Work
Preliminaries
Tensors and Tensor Networks
Quantum Entanglement
Low Entanglement Under Canonical Partitions Is Necessary and Sufficient for Fitting Tensor
Tensor Network Equivalent to a Locally Connected Neural Network
Necessary and Sufficient Condition for Fitting Tensor
Low Entanglement Under Canonical Partitions Is Necessary and Sufficient for Accurate Prediction
Accurate Prediction Is Equivalent to Fitting Data Tensor
Necessary and Sufficient Condition for Accurate Prediction
Empirical Demonstration
Enhancing Suitability of Data to Locally Connected Neural Networks
Search for Feature Arrangement With Low Entanglement Under Canonical Partitions
Practical Algorithm via Surrogate for Entanglement
...and 44 more sections

Key Result

Theorem 1

Let ${\mathcal{W}}_{\mathrm{TN}} \in {\mathbb R}^{D_1 \times \cdots \times D_N}$ be a tensor generated by the locally connected tensor network defined in sec:fit_tensor:tn_lc_nn, and let ${\mathcal{A}} \in {\mathbb R}^{D_1 \times \cdots \times D_N}$. For any $\epsilon \in [0, \norm{{\mathcal{A}}} / where $D_{{\mathcal{K}}} := \min \brk[c]{ \prod_{n \in {\mathcal{K}}} D_n , \prod_{n \in {\mathcal{

Figures (8)

Figure 1: Tensor networks form a graphical language for fitting ( i.e. representing) tensors through tensor contractions. Tensor network definition: Every node in a tensor network is associated with a tensor, whose order is equal to the number of edges emanating from the node. An edge connecting two nodes specifies contraction between the tensors associated with the nodes (\ref{['sec:prelim:tensor']}), where the weight of the edge signifies the respective axes lengths. Tensor networks may also contain open edges, i.e. edges that are connected to a node on one side and are open on the other. The number of such open edges is equal to the order of the tensor produced by contracting the tensor network. Illustrations: Presented are exemplar tensor network diagrams of: (a) an order $N$ tensor ${\mathcal{A}} \in {\mathbb R}^{D_1 \times \cdots \times D_N}$; (b) a vector-matrix multiplication between ${\mathbf M} \in {\mathbb R}^{D_1 \times D_2}$ and ${\mathbf v} \in {\mathbb R}^{D_2}$, which results in the vector ${\mathbf M} {\mathbf v} \in {\mathbb R}^{D_1}$; and (c) a more elaborate tensor network generating ${\mathcal{W}} \in {\mathbb R}^{D_1 \times D_2 \times D_3}$.
Figure 2: The analyzed tensor network equivalent to a locally connected neural network. (a) We consider a tensor network adhering to a perfect binary tree connectivity with $N = 2^L$ leaf nodes, for $L \in {\mathbb N}$, generating ${\mathcal{W}}_{\mathrm{TN}} \in {\mathbb R}^{D_1 \times \cdots \times D_N}$. Axes corresponding to open edges are indexed such that open edges descendant to any node of the tree have contiguous indices. The lengths of axes corresponding to inner (non-open) edges are equal to $R \in {\mathbb N}$, referred to as the width of the tensor network. (b) Contracting ${\mathcal{W}}_{\mathrm{TN}}$ with vectors ${\mathbf x}^{(1)} \in {\mathbb R}^{D_1}, \ldots, {\mathbf x}^{(N)} \in {\mathbb R}^{D_N}$ produces $\langle{ \otimes_{n = 1}^N {\mathbf x}^{(n)} },{{\mathcal{W}}_{\mathrm{TN}}}\rangle$. Performing these contractions from leaves to root can be viewed as a forward pass of a data instance $\brk{ {\mathbf x}^{(1)}, \ldots, {\mathbf x}^{(N)} }$ through a certain locally connected neural network (with polynomial non-linearity; see, e.g., cohen2016expressivecohen2017inductivelevine2018deeprazin2022implicit). Accordingly, we call the tensor network generating ${\mathcal{W}}_{\mathrm{TN}}$ a locally connected tensor network.
Figure 3: The canonical partitions of $[N]$, for $N = 2^L$ with $L \in {\mathbb N}$. Every $l \in \{0, \ldots, L\}$ contributes $2^l$ canonical partitions, the $n$'th one induced by ${\mathcal{K}} = \{2^{L - l} \cdot (n - 1) + 1, \ldots, 2^{L - l} \cdot n\}$.
Figure 4: The prediction accuracies of common locally connected neural networks are inversely correlated with the entanglements of the data under canonical partitions of features, in compliance with our theory (\ref{['sec:accurate_predict:fit_data_tensor', 'sec:accurate_predict:nec_and_suf']}). Left: Average entanglement under canonical partitions (\ref{['def:canonical_partitions']}) of the empirical data tensor (\ref{['eq:data_tensor']}), for binary classification variants of the Speech Commands audio dataset warden2018speech obtained by performing random position swaps between features. Right: Test accuracies achieved by a convolutional neural network (CNN) dai2017very, S4 (a popular class of recurrent neural networks; see gu2022efficiently), and a local self-attention model rae-razavi-2020-transformers, against the number of random feature swaps performed to create the dataset. All: Reported are the means and standard deviations of the quantities specified above, taken over ten different random seeds. See \ref{['app:extension_dims:accurate_predict:emp_demo']} for experiments over (two-dimensional) image data and \ref{['app:experiments:details']} for further implementation details.
Figure 5: Surrogate entanglement (\ref{['def:surrogate_entanglement']}) is strongly correlated with the entanglement (\ref{['def:entanglement']}) of the empirical data tensor. Presented are average entanglement and average surrogate entanglement under canonical partitions, admitted by the Speech Commands audio datasets warden2018speech considered in \ref{['fig:entanglement_inv_corr_acc']}. Remarkably, the Pearson correlation between the quantities is $0.974$. For further details see caption of \ref{['fig:entanglement_inv_corr_acc']} as well as \ref{['app:experiments:details']}.
...and 3 more figures

Theorems & Definitions (52)

Definition 1
Definition 2
Theorem 1
proof : Proof sketch (proof in \ref{['app:proofs:fit_necssary']})
Theorem 2
proof : Proof sketch (proof in \ref{['app:proofs:fit_sufficient']})
Definition 3
Corollary 1
proof : Proof sketch (proof in \ref{['app:proofs:cor:acc_pred_nec_and_suf']})
Proposition 1
...and 42 more

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

TL;DR

Abstract

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (52)