Criticality in Formal Languages and Statistical Physics
Henry W. Lin, Max Tegmark
TL;DR
The paper reveals that mutual information decay in formal languages depends on the generative grammar: probabilistic regular grammars yield exponential decay, while context-free grammars with hierarchical depth can produce power-law decay, signaling critical-like long-range correlations. It introduces rational mutual information as a practical bound and derives that probabilistic regular grammars cannot exhibit criticality, whereas PCFGs can (Theorem 3). By linking Bayesian networks, CNF grammar forms, and deep hierarchical models, the work connects these ideas to physics (no 1D phase transitions) and to neural networks, suggesting that depth enables short-path correlations that reproduce long-range dependencies. It also proposes a practical diagnostic—analyzing mutual information as a function of symbol separation—to evaluate and improve machine learning models, particularly recurrent architectures like LSTMs.
Abstract
We show that the mutual information between two symbols, as a function of the number of symbols between the two, decays exponentially in any probabilistic regular grammar, but can decay like a power law for a context-free grammar. This result about formal languages is closely related to a well-known result in classical statistical mechanics that there are no phase transitions in dimensions fewer than two. It is also related to the emergence of power-law correlations in turbulence and cosmological inflation through recursive generative processes. We elucidate these physics connections and comment on potential applications of our results to machine learning tasks like training artificial recurrent neural networks. Along the way, we introduce a useful quantity which we dub the rational mutual information and discuss generalizations of our claims involving more complicated Bayesian networks.
