Competition between Two Kinds of Correlations in Literary Texts
S. S. Melnyk, O. V. Usatenko, V. A. Yampol'skii, V. A. Golick
TL;DR
The paper addresses how to quantify and model long-range correlations in coarse-grained literary texts using additive Markov chains with memory functions. It develops a framework linking memory functions to observed variance and correlation, and demonstrates that texts exhibit antipersistent short-range and power-law persistent long-range correlations, which together shape text statistics. Through analysis of the Bible and other works, it shows a robust, two-regime memory structure and reveals self-similarity under decimation, highlighting grammatical versus semantic contributions. The approach provides a compact, transferable descriptor (the memory function) for symbolic sequences and suggests broader applications to other complex correlated systems.
Abstract
A theory of additive Markov chains with long-range memory is used for description of correlation properties of coarse-grained literary texts. The complex structure of the correlations in texts is revealed. Antipersistent correlations at small distances, L < 300, and persistent ones at L > 300 define this nontrivial structure. For some concrete examples of literary texts, the memory functions are obtained and their power-law behavior at long distances is disclosed. This property is shown to be a cause of self-similarity of texts with respect to the decimation procedure.
