A systematic investigation of learnability from single child linguistic input

Yulu Qin; Wentao Wang; Brenden M. Lake

A systematic investigation of learnability from single child linguistic input

Yulu Qin, Wentao Wang, Brenden M. Lake

TL;DR

The paper investigates how learnability for language can emerge when a model is trained on the limited linguistic input a single child encounters. It expands prior work by evaluating six architectures across five datasets (three single-child corpora plus baselines) and by using diverse evaluation methods, including the Zorro grammaticality suite, embedding visualizations, and cloze tests. Across configurations, the study finds robust emergence of syntactic and semantic structure and selective sensitivity to linguistic phenomena, mirroring prior single-child studies. The results suggest that data-efficient, child-directed input can support meaningful linguistic representations across architectures, with implications for cognitive modeling and the realism of language-learning simulations, while highlighting limitations and directions for future multi-modal research.

Abstract

Language models (LMs) have demonstrated remarkable proficiency in generating linguistically coherent text, sparking discussions about their relevance to understanding human language learnability. However, a significant gap exists between the training data for these models and the linguistic input a child receives. LMs are typically trained on data that is orders of magnitude larger and fundamentally different from child-directed speech (Warstadt and Bowman, 2022; Warstadt et al., 2023; Frank, 2023a). Addressing this discrepancy, our research focuses on training LMs on subsets of a single child's linguistic input. Previously, Wang, Vong, Kim, and Lake (2023) found that LMs trained in this setting can form syntactic and semantic word clusters and develop sensitivity to certain linguistic phenomena, but they only considered LSTMs and simpler neural networks trained from just one single-child dataset. Here, to examine the robustness of learnability from single-child input, we systematically train six different model architectures on five datasets (3 single-child and 2 baselines). We find that the models trained on single-child datasets showed consistent results that matched with previous work, underscoring the robustness of forming meaningful syntactic and semantic representations from a subset of a child's linguistic input.

A systematic investigation of learnability from single child linguistic input

TL;DR

Abstract

Paper Structure (12 sections, 3 figures, 5 tables)

This paper contains 12 sections, 3 figures, 5 tables.

Introduction
Methods
Datasets
Data Preprocessing
Model Architectures and Training
Tokenizer
Results
Linguistic Acceptability Tests
Visualizations for Syntactic and Semantic Categories
Cloze Tests
General Discussion
Acknowledgments

Figures (3)

Figure 1: Zorro test accuracies across different settings. We tested 6 model architectures on 23 linguistic tests in Zorro. Each model architecture, trained with 3 seeds, yielded 18 accuracy data points per dataset. Our scatter plots show results for 8 selected tests, with the test name and an example sentence pair (unacceptable/acceptable) highlighted above each. For example, models evaluate which is more acceptable in the "case--subjective pronoun" test: "the baby gave she my book." or "she gave the baby my book." We found models trained on single-child datasets excel in specific tests but struggle in others, like subject-verb agreement. Four high-performing tests are shown in the first row, and four lower-performing tests, particularly for subject-verb agreement, are in the second row. Chance is the dotted line. Runs with 3 seeds show variability, similar to previous findings sellam2022multibertsyedetore-etal-2023-poor.
Figure 2: Clustering different models' word embeddings for syntactic categories. We ran t-SNE to visualize embeddings of all words in the vocabulary that are categorized into one of the four syntactic categories: noun, verb, adjective, and adverb. t-SNE uses $1 - \cos(u, v)$ as the distance metric. We show seven visualizations here from various training datasets and model architectures labeled below the plots. Nouns and verbs form two large salient clusters, while adjectives and adverbs are mostly clustered together.
Figure 3: Clustering word embeddings for semantic categories. Here we visualize word embeddings of three architectures trained on the Sarah dataset: (a, b) GPT-2 (2-layer), (c) BabyBERTa (2-layer), (d) LSTM (1-layer). Again, t-SNE and dendrogram plots use the cosine measure in Figure \ref{['fig:syntactic_categories']}. We present the 6 most frequent words from 8 different categories. There are distinct clusters corresponding to semantic categories, including body parts, clothing, animals, and places.

A systematic investigation of learnability from single child linguistic input

TL;DR

Abstract

A systematic investigation of learnability from single child linguistic input

Authors

TL;DR

Abstract

Table of Contents

Figures (3)