What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement
Yotam Alexander, Nimrod De La Vega, Noam Razin, Nadav Cohen
TL;DR
This work tackles the fundamental question of what makes data distributions suitable for locally connected neural networks (LC-NNs) by introducing a physics-inspired framework that treats data as tensors and analyzes learnability through quantum entanglement (QE) under canonical feature partitions. It proves a necessary-and-sufficient condition: a LC-NN can achieve low population loss if and only if the data tensor exhibits low entanglement across all canonical partitions, with the entanglement bound tied to the network width $R$ via $QE\le\ln(R)$ up to small terms. The authors translate theory into practice by proposing a data-enhancement protocol that rearranges features to reduce entanglement, using a surrogate $SE$ based on multivariate Pearson correlations and minimum balanced cuts solvable by graph-partitioning algorithms; this approach yields substantial improvements across CNNs, S4, and local-attention models on audio, tabular, and image data. Overall, the work offers a principled, physics-grounded perspective on data conditioning and architecture-data co-design, with practical implications for improving LC-NN performance on natural data modalities.
Abstract
The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics. Our main theoretical result states that a certain locally connected neural network is capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement under certain canonical partitions of features. As a practical application of this result, we derive a preprocessing method for enhancing the suitability of a data distribution to locally connected neural networks. Experiments with widespread models over various datasets demonstrate our findings. We hope that our use of quantum entanglement will encourage further adoption of tools from physics for formally reasoning about the relation between deep learning and real-world data.
