Quantifying patterns of punctuation in modern Chinese prose
Michał Dolina, Jakub Dec, Stanisław Drożdż, Jarosław Kwapień, Jin Liu, Tomasz Stanisz
TL;DR
The paper investigates punctuation and word usage patterns in modern Chinese prose using Zipf's law, a discrete Weibull model for inter-punctuation distances, and Multifractal Detrended Fluctuation Analysis to quantify sentence-length variability. By analyzing three contemporary Chinese novels and their English translations, it shows Zipf-like rank-frequency behavior near $\gamma \approx 1$ when counting $n$-grams and demonstrates that punctuation improves Zipf fits; inter-punctuation intervals and sentence lengths align with a discrete Weibull law, with Chinese distances exhibiting thicker tails. Multifractal analysis reveals strong multifractality in sentence lengths for Soul Mountain (and The Drunkard) and more monofractal behavior for The Sun Shines over the Sanggan River, with translations showing broadly similar fractal traits in some cases. Overall, the findings point to universal punctuation and word-distribution patterns across languages and highlight how narrative form influences fractal structure, while calling for broader corpora to validate cross-language generalizations.
Abstract
Recent research shows that punctuation patterns in texts exhibit universal features across languages. Analysis of Western classical literature reveals that the distribution of spaces between punctuation marks aligns with a discrete Weibull distribution, typically used in survival analysis. By extending this analysis to Chinese literature represented here by three notable contemporary works, it is shown that Zipf's law applies to Chinese texts similarly to Western texts, where punctuation patterns also improve adherence to the law. Additionally, the distance distribution between punctuation marks in Chinese texts follows the Weibull model, though larger spacing is less frequent than in English translations. Sentence-ending punctuation, representing sentence length, diverges more from this pattern, reflecting greater flexibility in sentence length. This variability supports the formation of complex, multifractal sentence structures, particularly evident in Gao Xingjian's "Soul Mountain". These findings demonstrate that both Chinese and Western texts share universal punctuation and word distribution patterns, underscoring their broad applicability across languages.
