On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

Hanyu Zhao; Yang Wu; Yuexian Hou

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

Hanyu Zhao, Yang Wu, Yuexian Hou

TL;DR

The results suggest that non-classical statistics can provide a novel perspective for understanding internal interactions and training dynamics of deep networks.

Abstract

Inspired by measurement incompatibility and Bell-family inequalities in quantum mechanics, we propose the Non-Classical Network (NCnet), a simple classical neural architecture that stably exhibits non-classical statistical behaviors under typical and interpretable experimental setups. We find non-classicality, measured by the $S$ statistic of CHSH inequality, arises from gradient competitions of hidden-layer neurons shared by multi-tasks. Remarkably, even without physical links supporting explicit communication, one task head can implicitly sense the training task of other task heads via local loss oscillations, leading to non-local correlations in their training outcomes. Specifically, in the low-resource regime, the value of $S$ increases gradually with increasing resources and approaches toward its classical upper-bound 2, which implies that underfitting is alleviated with resources increase. As the model nears the critical scale required for adequate performance, $S$ may temporarily exceed 2. As resources continue to grow, $S$ then asymptotically decays down to and fluctuates around 2. Empirically, when model capacity is insufficient, $S$ is positively correlated with generalization performance, and the regime where $S$ first approaches $2$ often corresponding to good generalization. Overall, our results suggest that non-classical statistics can provide a novel perspective for understanding internal interactions and training dynamics of deep networks.

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

TL;DR

The results suggest that non-classical statistics can provide a novel perspective for understanding internal interactions and training dynamics of deep networks.

Abstract

statistic of CHSH inequality, arises from gradient competitions of hidden-layer neurons shared by multi-tasks. Remarkably, even without physical links supporting explicit communication, one task head can implicitly sense the training task of other task heads via local loss oscillations, leading to non-local correlations in their training outcomes. Specifically, in the low-resource regime, the value of

increases gradually with increasing resources and approaches toward its classical upper-bound 2, which implies that underfitting is alleviated with resources increase. As the model nears the critical scale required for adequate performance,

may temporarily exceed 2. As resources continue to grow,

then asymptotically decays down to and fluctuates around 2. Empirically, when model capacity is insufficient,

is positively correlated with generalization performance, and the regime where

first approaches

often corresponding to good generalization. Overall, our results suggest that non-classical statistics can provide a novel perspective for understanding internal interactions and training dynamics of deep networks.

Paper Structure (17 sections, 1 theorem, 4 equations, 5 figures, 4 tables)

This paper contains 17 sections, 1 theorem, 4 equations, 5 figures, 4 tables.

Introduction
Preliminaries
Local Realism and Local Hidden Variable Models
The Classical Upper-Bound of the CHSH Statistic
Related Work
Methodology
Framework of NCnet and Task Definition
CHSH Inequality Violation Based on NCnet
Causes of Non-Classical Features
Real-World Experiments
Experiments Setup
Experiments Results
Experimental Summary and Discussion
Dynamic Analysis of the Training Process
Generalization Ability and Non-classical Correlation Analysis under Different LoRA Ranks
...and 2 more sections

Key Result

Theorem 1

For any local hidden-variable theory, the CHSH value $S$ satisfies where $C_{A_iB_j} = \mathbb{E}[A_i B_j]$ is an expectation statistic indicates the association between the measurement outcomes of Alice and Bob. If supposing that $A_i$ and $B_j$ are centered and normalized, it turns out that $C_{A_iB_j}$ simply becomes Person's correlation coefficient.

Figures (5)

Figure 1: The framework of the proposed NCnet. (a) XORnet illustrates the basic network structure for modeling the XOR function with ReLU activations. (b) NCnet is constructed by integrating two XORnets. The red node denotes a shared neuron where gradient competition is prone to occur.
Figure 2: Overall results of NCnet with different hidden-layer sizes. (a) Scatter distribution of $S$ obtained from 50 independent runs of NCnet for hidden-layer sizes $n = 2, 3, 4$. The red dashed line indicates the classical upper-bound of the CHSH statistic, and the blue dot-dashed line marks the Tsirelson bound. (b) Mean correlation values $C(A_i , B_j)$ and the corresponding average statistic $S$ for each value of $n$.
Figure 3: The trend of the CHSH statistic $S$ as a function of the rank $r$. The plot includes two sets of experimental results: the blue curve (Multilingual Training) and the orange curve (Mixed Reasoning Tasks). The red dashed line represents the classical upper-bound $S = 2$.
Figure 4: The CHSH statistic $S$ convergence across different ranks under Mixed Reasoning Tasks. (a) Higher ranks $r$ lead to faster convergence of $S$. (b) The bar chart reports $\mu_{\nabla S}$, the arithmetic mean of the instantaneous slopes of $S$ over epochs 0–80; larger bars indicate a faster convergence rate.
Figure 5: Generalization performance and the CHSH statistic $S$ across ranks $r$ in Multilingual Training. Bars show task-pair mean accuracy $\overline{\mathrm{Acc}}(A_i,B_j)=\frac{\mathrm{Acc}(A_i)+\mathrm{Acc}(B_j)}{2}$. The purple curve shows the combination average $\mathrm{Acc}_{\mathrm{comb\_avg}}$ (the mean of the four bars) per rank. The red curve denotes $S$ value, reflecting the non-classical coupling strength of their learned representations.

Theorems & Definitions (1)

Theorem 1: Classical Upper-Bound of the CHSH Statistic

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

TL;DR

Abstract

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)