Identifying Quantum Mechanical Statistics in Italian Corpora
Diederik Aerts, Jonito Aerts Arguëlles, Lester Beltran, Massimiliano Sassoli de Bianchi, Sandro Sozzo
TL;DR
The paper investigates whether word frequencies in human language exhibit quantum statistical patterns, extending prior findings from English to Italian texts. It develops a theoretical framework that maps words to energy levels and analyzes large Italian corpora using Bose--Einstein versus Maxwell--Boltzmann statistics, finding that Bose--Einstein statistics accurately models word distributions and reveals meaning-driven, entanglement-like correlations. The authors further show that word randomization acts like a temperature increase, reducing coherence and making classical statistics more applicable, which supports a decoherence-inspired interpretation of meaning in language. The results endorse a language-general, meaning-driven mechanism for quantum statistics in cognition, motivate a conceptuality interpretation of quantum mechanics, and point toward a quantum-thermodynamic treatment of information and language with potential cross-domain insights for physics.
Abstract
We present a theoretical and empirical investigation of the statistical behaviour of the words in a text produced by human language. To this aim, we analyse the word distribution of various texts of Italian language selected from a specific literary corpus. We firstly generalise a theoretical framework elaborated by ourselves to identify 'quantum mechanical statistics' in large-size texts. Then, we show that, in all analysed texts, words distribute according to 'Bose--Einstein statistics' and show significant deviations from 'Maxwell--Boltzmann statistics'. Next, we introduce an effect of 'word randomization' which instead indicates that the difference between the two statistical models is not as pronounced as in the original cases. These results confirm the empirical patterns obtained in texts of English language and strongly indicate that identical words tend to 'clump together' as a consequence of their meaning, which can be explained as an effect of 'quantum entanglement' produced through a phenomenon of 'contextual updating'. More, word randomization can be seen as the linguistic-conceptual equivalent of an increase of temperature which destroys 'coherence' and makes classical statistics prevail over quantum statistics. Some insights into the origin of quantum statistics in physics are finally provided.
