Spontaneous emergence of linguistic statistical laws in images via artificial neural networks
Ping-Rui Tsai, Chi-hsiang Wang, Yu-Cheng Liao, Hong-Yue Huang, Tzay-Ming Hong
TL;DR
This work investigates whether images processed by vision-focused neural networks spontaneously develop language-like statistical structures. By treating convolutional kernels as visual words and counting highly active pixels, the authors show Zipf's law, Heaps' law, and Benford's law emerge in image-derived representations across multiple datasets and architectures, without explicit symbolic labeling. The analysis reveals that these laws are robust to various perturbations, though susceptibility varies by perturbation type and network design, with Benford's law showing notable resilience. The findings suggest that quasi-symbolic structures can arise from perceptual processing itself, offering fresh insight into symbol grounding, interpretability, and the perceptual roots of language-like organization in artificial systems.
Abstract
As a core element of culture, images transform perception into structured representations and undergo evolution similar to natural languages. Given that visual input accounts for 60% of human sensory experience, it is natural to ask whether images follow statistical regularities similar to those in linguistic systems. Guided by symbol-grounding theory, which posits that meaningful symbols originate from perception, we treat images as vision-centric artifacts and employ pre-trained neural networks to model visual processing. By detecting kernel activations and extracting pixels, we obtain text-like units, which reveal that these image-derived representations adhere to statistical laws such as Zipf's, Heaps', and Benford's laws, analogous to linguistic data. Notably, these statistical regularities emerge spontaneously, without the need for explicit symbols or hybrid architectures. Our results indicate that connectionist networks can automatically develop structured, quasi-symbolic units through perceptual processing alone, suggesting that text- and symbol-like properties can naturally emerge from neural networks and providing a novel perspective for interpretation.
