Table of Contents
Fetching ...

From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition

Qiyuan Yang, Pengda Wang, Luke D. Plonsky, Frederick L. Oswald, Hanjie Chen

Abstract

We examine the language capabilities of language models (LMs) from the critical perspective of human language acquisition. Building on classical language development theories, we propose a three-stage framework to assess the abilities of LMs, ranging from preliminary word understanding to complex grammar and complex logical reasoning. Using this framework, we evaluate the generative capacities of LMs using methods from linguistic research. Results indicate that although recent LMs outperform earlier models in overall performance, their developmental trajectory does not strictly follow the path of human language acquisition. Notably, in generation tasks, LMs are more similar to human performance in areas where information is easier to extract from the corpus, such as average word length, clauses, and auxiliary verbs. Newer LMs did not exhibit significant progress in terms of specific dimensions, such as clauses and auxiliary verbs, where the variation across corpora is relatively limited. Register theory offers a plausible explanation for these observations, suggesting that the linguistic features of the training data have a substantial impact on the models' abilities.

From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition

Abstract

We examine the language capabilities of language models (LMs) from the critical perspective of human language acquisition. Building on classical language development theories, we propose a three-stage framework to assess the abilities of LMs, ranging from preliminary word understanding to complex grammar and complex logical reasoning. Using this framework, we evaluate the generative capacities of LMs using methods from linguistic research. Results indicate that although recent LMs outperform earlier models in overall performance, their developmental trajectory does not strictly follow the path of human language acquisition. Notably, in generation tasks, LMs are more similar to human performance in areas where information is easier to extract from the corpus, such as average word length, clauses, and auxiliary verbs. Newer LMs did not exhibit significant progress in terms of specific dimensions, such as clauses and auxiliary verbs, where the variation across corpora is relatively limited. Register theory offers a plausible explanation for these observations, suggesting that the linguistic features of the training data have a substantial impact on the models' abilities.

Paper Structure

This paper contains 37 sections, 2 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Three-Stage Anatomy of Language Acquisition.
  • Figure 2: Performance of LMs across three stages. The upper right legend corresponds to models tested in tasks except for ReClor. The lower right legend corresponds to models tested in ReClor. For each task, models are ordered by their time released, and the tie is broken by their parameter sizes. Results from CoLA also use a different metric; please refer to Figure \ref{['fig:cola']} in Appendix \ref{['sec:tablegraphs']}.
  • Figure 3: Generation Abilities of six models along five selected dimensions.
  • Figure 4: CoLA performance in Stage II measured in Matthews Correlation Coefficient (\ref{['formula:mcc']}). The result is obtained by training models at most 20 epochs
  • Figure 5: Grammar-diag performance in Stage II. Models are ordered by time. We test on models after fine-tuning on bc-if-why and grammar-comp's training set.
  • ...and 5 more figures