Table of Contents
Fetching ...

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Suchir Salhan, Richard Diehl Martinez, Zébulon Goriely, Paula Buttery

TL;DR

The paper investigates whether acquisition-theory grounded curriculum learning can improve cross-lingual SSLMs trained on age-ordered CDS, introducing Growing, Inwards, and MMM curricula and the Mao-CHILDES corpus. It shows that fine-grained language-specific curricula, especially MMM with semantic tagging, can yield statistically significant gains on minimal-pair syntactic evaluations across English, Chinese, Japanese, and others, while universal maturational curricula have mixed or limited effects. The approach demonstrates data efficiency by achieving competitive performance with far fewer parameters and training data than large LLMs, highlighting the potential of cognitively motivated pre-training for multilingual, data-constrained settings. Practical implications include the value of language-specific curriculum design and richer CDS resources for improving cross-lingual syntactic generalization in SSLMs.

Abstract

Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale Language Models (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines and performance benefits of curricula strategies in SSLMs can be derived by specifying fine-grained language-specific curricula that precisely replicate language acquisition theories.

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

TL;DR

The paper investigates whether acquisition-theory grounded curriculum learning can improve cross-lingual SSLMs trained on age-ordered CDS, introducing Growing, Inwards, and MMM curricula and the Mao-CHILDES corpus. It shows that fine-grained language-specific curricula, especially MMM with semantic tagging, can yield statistically significant gains on minimal-pair syntactic evaluations across English, Chinese, Japanese, and others, while universal maturational curricula have mixed or limited effects. The approach demonstrates data efficiency by achieving competitive performance with far fewer parameters and training data than large LLMs, highlighting the potential of cognitively motivated pre-training for multilingual, data-constrained settings. Practical implications include the value of language-specific curriculum design and richer CDS resources for improving cross-lingual syntactic generalization in SSLMs.

Abstract

Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale Language Models (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines and performance benefits of curricula strategies in SSLMs can be derived by specifying fine-grained language-specific curricula that precisely replicate language acquisition theories.

Paper Structure

This paper contains 20 sections, 1 equation, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Acquisition-inspired Objective Curricula: We specify Objective Curricula Growing, Inwards, MMM (UPOS), MMM (Semantic) for three theories of acquisition (Section\ref{['acquisition']}). The Progression of Curriculum Units replicate the predicted developmental sequences by specifying curriculum units (defined in Table\ref{['tab:units']}) defined over different pre-training stages, expressed as a percentage of training steps.
  • Figure 2: A sample of Child-Directed Speech (CDS) from FrenchMao-CHILDES that learners receive from caregivers at different stages of acquisition. Stages of acquisition are standardly defined in terms of mean lengths of utterances produced by learners.
  • Figure 3: Comparision of BLiMP Performance of English SSLMs with CLIMB curricula and Growing, Inwards, MMM (UPOS), MMM (SEM) (Section\ref{['cognitive']}) We report introduced by warstadt2023papers for T5-base and OPT-125m models. We include the improved BabyBERTa baseline implemented in martinez-etal-2023-climb, which beat the baseline used in the $1^{\text{st}}$ BabyLM Shared Task. We report BLiMP performance of different CLIMB small-raw models (also used in the standard architecture of Mao-BabyBERTa used with the three objective curricula) for the best performing dynamic curriculum learning strategies implemented in martinez-etal-2023-climb. This includes CLIMB's Data Curriculum (Log Pacing with Source Difficulty), Vocabulary Curriculum (Log Pacing with Token ID Difficulty), two Objective Curricula strategies (MLM + All uses a multitask objective of masked language modelling and objective curricula specified by 10 tags throughout all training steps, MLM + NV uses three tags throughout training), and the best performing Combination Model (Token ID Vocabulary Curricula, Random + model ppx Data Curricula, Multitask Objective Curricula).
  • Figure 4: Distribution of Silver Tags across all languages in the Mao-CHILDES corpus, annotated using a SpaCy Multilingual UPOS Tagger