Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger
TL;DR
This study investigates diachronic syntactic change in English and German by analyzing parliamentary debates over roughly 160 years with five parsers and 15 metrics related to dependency distance minimization and tree-graph properties. It reveals that parser choice materially affects observed trends and demonstrates a general convergence between English and German across most metrics, with German occasionally varying more. The work introduces a robust, multi-parser framework and three evaluation domains (UD treebanks, target treebanks, adversarial treebanks) to assess reliability and uses a majority-vote approach to stabilize trend detection. The findings show that significant syntactic changes cluster at sentence-length tails and that many non-distance metrics align across languages, providing a comprehensive, modern NLP perspective on historical syntax with practical implications for diachronic NLP tasks. Together, these results caution against relying on a single parser for syntactic-change studies and highlight subtle cross-language patterns in long-span syntactic evolution.
Abstract
Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German for the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language change using modern NLP technology in recent corpora of English and German.
