Table of Contents
Fetching ...

Mini Minds: Exploring Bebeshka and Zlata Baby Models

Irina Proskurina, Guillaume Metzler, Julien Velcin

TL;DR

The paper investigates whether tiny language models trained from a developmentally plausible, restricted corpus can achieve practical language understanding. Using an architecture-search pipeline based on Tree-structured Parzen Estimation, the authors design two compact models, Bebeshka (4-layer encoder with 8 heads) and Zlata (6-layer decoder with 12 heads), achieving performance close to larger baselines on BabyLM Strict-Small tasks. Beyond standard NLP benchmarks, they also assess moral judgments, demonstrating competitive ethics-related performance for small LMs, potentially due to the nature of the training data. The study highlights the viability of compact LMs for efficient inference and ethically aligned language understanding, offering design guidelines and suggesting directions for future work with even smaller or developmentally informed corpora.

Abstract

In this paper, we describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition. The shared task is created with an emphasis on small-scale language modelling from scratch on limited-size data and human language acquisition. Dataset released for the Strict-Small track has 10M words, which is comparable to children's vocabulary size. We approach the task with an architecture search, minimizing masked language modelling loss on the data of the shared task. Having found an optimal configuration, we introduce two small-size language models (LMs) that were submitted for evaluation, a 4-layer encoder with 8 attention heads and a 6-layer decoder model with 12 heads which we term Bebeshka and Zlata, respectively. Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance. We further explore the applicability of small-scale language models in tasks involving moral judgment, aligning their predictions with human values. These findings highlight the potential of compact LMs in addressing practical language understanding tasks.

Mini Minds: Exploring Bebeshka and Zlata Baby Models

TL;DR

The paper investigates whether tiny language models trained from a developmentally plausible, restricted corpus can achieve practical language understanding. Using an architecture-search pipeline based on Tree-structured Parzen Estimation, the authors design two compact models, Bebeshka (4-layer encoder with 8 heads) and Zlata (6-layer decoder with 12 heads), achieving performance close to larger baselines on BabyLM Strict-Small tasks. Beyond standard NLP benchmarks, they also assess moral judgments, demonstrating competitive ethics-related performance for small LMs, potentially due to the nature of the training data. The study highlights the viability of compact LMs for efficient inference and ethically aligned language understanding, offering design guidelines and suggesting directions for future work with even smaller or developmentally informed corpora.

Abstract

In this paper, we describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition. The shared task is created with an emphasis on small-scale language modelling from scratch on limited-size data and human language acquisition. Dataset released for the Strict-Small track has 10M words, which is comparable to children's vocabulary size. We approach the task with an architecture search, minimizing masked language modelling loss on the data of the shared task. Having found an optimal configuration, we introduce two small-size language models (LMs) that were submitted for evaluation, a 4-layer encoder with 8 attention heads and a 6-layer decoder model with 12 heads which we term Bebeshka and Zlata, respectively. Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance. We further explore the applicability of small-scale language models in tasks involving moral judgment, aligning their predictions with human values. These findings highlight the potential of compact LMs in addressing practical language understanding tasks.
Paper Structure (26 sections, 1 figure, 9 tables)

This paper contains 26 sections, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Accuracy on BLiMP tasks of our LMs with RoBERTa-base, OPT-125M, and T5-base baselines. The lighter colours correspond to greater accuracy and, hence, better scores. Morphology: Anaphor Agr., D-N Agr., Irregular Forms, S-V Agr.. Semantics: NPI Licensing, Quantifiers. Syntax-Semantics:Binding, Control/Raising. The rest phenomena correspond to the Syntax category.