Table of Contents
Fetching ...

Evaluating the cognitive reality of Spanish irregular morphomic patterns: Humans vs. Transformers

Akhilesh Kakolu Ramarao, Kevin Tang, Dinah Baer-Henney

TL;DR

This work tests whether transformer models can generalize the Spanish L-shaped morphome in a cognitively plausible way by mirroring Nevins2015's nonce-word task under three frequency distributions. The findings show that while transformers reach higher stem accuracy than humans, their response preferences diverge, consistently favoring irregular L-shaped forms and showing sensitivity to training distribution, unlike humans who maintain a natural-preference bias. Phonological similarity effects appear in models under certain distributions, suggesting some surface-level analogical generalization, but humans do not exhibit this pattern. Overall, the study reveals limited cognitive realism in current transformers for this morphomic generalization, highlighting the strong influence of input distribution and phonology on model behavior and guiding future work on more ecologically grounded cognitive modeling.

Abstract

Do transformer models generalize morphological patterns like humans do? We investigate this by directly comparing transformers to human behavioral data on Spanish irregular morphomic patterns from \citet{Nevins2015TheRA}. We adopt the same analytical framework as the original human study. Under controlled input conditions, we evaluate whether transformer models can replicate human-like sensitivity to the morphome, a complex linguistic phenomenon. Our experiments focus on three frequency conditions: natural, low-frequency, and high-frequency distributions of verbs exhibiting irregular morphomic patterns. Transformer models achieve higher stem-accuracy than human participants. However, response preferences diverge: humans consistently favor the "natural" inflection across all items, whereas models preferred the irregular forms, and their choices are modulated by the proportion of irregular verbs present during training. Moreover, models trained on the natural and low-frequency distributions, but not the high-frequency distribution, exhibit sensitivity to phonological similarity between test items and Spanish L-shaped verbs, mirroring a limited aspect of human phonological generalization.

Evaluating the cognitive reality of Spanish irregular morphomic patterns: Humans vs. Transformers

TL;DR

This work tests whether transformer models can generalize the Spanish L-shaped morphome in a cognitively plausible way by mirroring Nevins2015's nonce-word task under three frequency distributions. The findings show that while transformers reach higher stem accuracy than humans, their response preferences diverge, consistently favoring irregular L-shaped forms and showing sensitivity to training distribution, unlike humans who maintain a natural-preference bias. Phonological similarity effects appear in models under certain distributions, suggesting some surface-level analogical generalization, but humans do not exhibit this pattern. Overall, the study reveals limited cognitive realism in current transformers for this morphomic generalization, highlighting the strong influence of input distribution and phonology on model behavior and guiding future work on more ecologically grounded cognitive modeling.

Abstract

Do transformer models generalize morphological patterns like humans do? We investigate this by directly comparing transformers to human behavioral data on Spanish irregular morphomic patterns from \citet{Nevins2015TheRA}. We adopt the same analytical framework as the original human study. Under controlled input conditions, we evaluate whether transformer models can replicate human-like sensitivity to the morphome, a complex linguistic phenomenon. Our experiments focus on three frequency conditions: natural, low-frequency, and high-frequency distributions of verbs exhibiting irregular morphomic patterns. Transformer models achieve higher stem-accuracy than human participants. However, response preferences diverge: humans consistently favor the "natural" inflection across all items, whereas models preferred the irregular forms, and their choices are modulated by the proportion of irregular verbs present during training. Moreover, models trained on the natural and low-frequency distributions, but not the high-frequency distribution, exhibit sensitivity to phonological similarity between test items and Spanish L-shaped verbs, mirroring a limited aspect of human phonological generalization.

Paper Structure

This paper contains 21 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: An example of nonce verb and its corresponding forms.
  • Figure 2: Response preference by participants and models.
  • Figure 3: Response preference by item for models and participants. The line indicates the neutral preference.
  • Figure 4: Models' sequence accuracy for items tested in Nevins2015TheRA.
  • Figure 5: Participants and models' stem accuracy for items tested in Nevins2015TheRA's study.
  • ...and 4 more figures