Table of Contents
Fetching ...

Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon

Amanda Doucette, Ryan Cotterell, Morgan Sonderegger, Timothy J. O'Donnell

TL;DR

This study interrogates whether a universal compensatory relationship exists between morphological irregularity and phonotactic complexity. By measuring MI and PC with information-theoretic models on 25 UniMorph languages and controlling for word length and frequency, it reveals a positive MI–PC relation within languages but no consistent cross-language pattern, challenging the universality of compensation. The results also reproduce Zipfian expectations for WL–FR, show a generally negative FR–PC within languages, and reveal a predominantly positive MI–FR signal at the lemma level, with cross-language patterns exhibiting substantial nonlinearity. The work highlights the importance of causal modeling to distinguish direct effects from correlations driven by shared factors, and it calls for future empirical tests of explicit causal structures in lexicon organization.

Abstract

It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex. This inverse correlation has been demonstrated in English for a small sample of words, but has yet to be shown for a larger sample of languages. Furthermore, frequency and word length are known to influence both phonotactic complexity and morphological irregularity, and they may be confounding factors in this relationship. Therefore, we examine the relationships between all pairs of these four variables both to assess the robustness of previous findings using improved methodology and as a step towards understanding the underlying causal relationship. Using information-theoretic measures of phonotactic complexity and morphological irregularity (Pimentel et al., 2020; Wu et al., 2019) on 25 languages from UniMorph, we find that there is evidence of a positive relationship between morphological irregularity and phonotactic complexity within languages on average, although the direction varies within individual languages. We also find weak evidence of a negative relationship between word length and morphological irregularity that had not been previously identified, and that some existing findings about the relationships between these four variables are not as robust as previously thought.

Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon

TL;DR

This study interrogates whether a universal compensatory relationship exists between morphological irregularity and phonotactic complexity. By measuring MI and PC with information-theoretic models on 25 UniMorph languages and controlling for word length and frequency, it reveals a positive MI–PC relation within languages but no consistent cross-language pattern, challenging the universality of compensation. The results also reproduce Zipfian expectations for WL–FR, show a generally negative FR–PC within languages, and reveal a predominantly positive MI–FR signal at the lemma level, with cross-language patterns exhibiting substantial nonlinearity. The work highlights the importance of causal modeling to distinguish direct effects from correlations driven by shared factors, and it calls for future empirical tests of explicit causal structures in lexicon organization.

Abstract

It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex. This inverse correlation has been demonstrated in English for a small sample of words, but has yet to be shown for a larger sample of languages. Furthermore, frequency and word length are known to influence both phonotactic complexity and morphological irregularity, and they may be confounding factors in this relationship. Therefore, we examine the relationships between all pairs of these four variables both to assess the robustness of previous findings using improved methodology and as a step towards understanding the underlying causal relationship. Using information-theoretic measures of phonotactic complexity and morphological irregularity (Pimentel et al., 2020; Wu et al., 2019) on 25 languages from UniMorph, we find that there is evidence of a positive relationship between morphological irregularity and phonotactic complexity within languages on average, although the direction varies within individual languages. We also find weak evidence of a negative relationship between word length and morphological irregularity that had not been previously identified, and that some existing findings about the relationships between these four variables are not as robust as previously thought.
Paper Structure (29 sections, 3 equations, 7 figures, 2 tables)

This paper contains 29 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Phonotactic complexity and morphological irregularity.
  • Figure 2: Phonotactic complexity and word length.
  • Figure 3: Morphological irregularity and frequency coefficients by language, grouped by lemma and by word, with 95% CIs.
  • Figure 4: Phonotactic complexity and frequency regression coefficients by language, with 95% CIs.
  • Figure 5: Morphological irregularity and word length.
  • ...and 2 more figures