BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop
Lucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Hu, Jaap Jumelet, Tal Linzen, Jing Liu, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Wilcox, Adina Williams
TL;DR
BabyLM 2025 expands the data-efficient language modeling agenda into a workshop format while retaining a competition with four tracks, including a new Interactivity track that uses teacher feedback and external models under strict exposure limits. The framework emphasizes data-efficient pretraining, cognitively plausible evaluation, and democratized participation through exchangeable compute; it introduces intermediate checkpoints and a psychometric evaluation suite to compare model behavior with human language learners. By providing a public dataset, a modular evaluation pipeline, baselines, and extensive FAQs, the paper advances interdisciplinary collaboration at the interface of cognitive science and language modeling and aims to foster reproducible, interpretable progress in data-efficient learning. The overall impact lies in enabling principled, small-data language learning research with social- and multimodal extensions, while balancing accessibility with rigorous assessment.
Abstract
BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 3rd BabyLM competition. As in previous years, we call for participants in the data-efficient pretraining challenge in the general track. This year, we also offer a new track: INTERACTION. This new track encourages interactive behavior, learning from a teacher, and adapting the teaching material to the student. We also call for papers outside the competition in any relevant areas. These include training efficiency, cognitively plausible research, weak model evaluation, and more.
