Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning
Wesley Scivetti, Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider
TL;DR
This work probes whether human-scale language models can learn both the form and the meaning of the rare let-alone construction. Using a templated, minimal-pair benchmark evaluated with a SLOR-based metric, the authors train two OPT-based models on BabyLM 100M data and assess formal syntactic constraints and scalar semantics. They find strong form-generalization—formal constraints are learned robustly—even with very limited direct exposure, but no robust semantic generalization for let-alone, despite prominent semantic expectations in humans and LLM prompts. Filtering pretraining to remove let-alone-related data further shows that form knowledge persists via indirect evidence, while removing literal tokens dramatically harms performance, highlighting an asymmetry not present in human learners and suggesting that current architectures rely differently on form versus meaning information. The results emphasize the need to address semantic learning for rare constructions in smaller-scale LMs and to broaden evaluation to more constructions and languages.
Abstract
Humans have a remarkable ability to acquire and understand grammatical phenomena that are seen rarely, if ever, during childhood. Recent evidence suggests that language models with human-scale pretraining data may possess a similar ability by generalizing from frequent to rare constructions. However, it remains an open question how widespread this generalization ability is, and to what extent this knowledge extends to meanings of rare constructions, as opposed to just their forms. We fill this gap by testing human-scale transformer language models on their knowledge of both the form and meaning of the (rare and quirky) English LET-ALONE construction. To evaluate our LMs we construct a bespoke synthetic benchmark that targets syntactic and semantic properties of the construction. We find that human-scale LMs are sensitive to form, even when related constructions are filtered from the dataset. However, human-scale LMs do not make correct generalizations about LET-ALONE's meaning. These results point to an asymmetry in the current architectures' sample efficiency between language form and meaning, something which is not present in human language learners.
