Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Suchir Salhan; Richard Diehl Martinez; Zébulon Goriely; Paula Buttery

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Suchir Salhan, Richard Diehl Martinez, Zébulon Goriely, Paula Buttery

TL;DR

The paper investigates whether acquisition-theory grounded curriculum learning can improve cross-lingual SSLMs trained on age-ordered CDS, introducing Growing, Inwards, and MMM curricula and the Mao-CHILDES corpus. It shows that fine-grained language-specific curricula, especially MMM with semantic tagging, can yield statistically significant gains on minimal-pair syntactic evaluations across English, Chinese, Japanese, and others, while universal maturational curricula have mixed or limited effects. The approach demonstrates data efficiency by achieving competitive performance with far fewer parameters and training data than large LLMs, highlighting the potential of cognitively motivated pre-training for multilingual, data-constrained settings. Practical implications include the value of language-specific curriculum design and richer CDS resources for improving cross-lingual syntactic generalization in SSLMs.

Abstract

Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale Language Models (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines and performance benefits of curricula strategies in SSLMs can be derived by specifying fine-grained language-specific curricula that precisely replicate language acquisition theories.

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

TL;DR

Abstract

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)