No Such Thing as a General Learner: Language models and their dual optimization

Emmanuel Chemla; Ryan M. Nefdt

No Such Thing as a General Learner: Language models and their dual optimization

Emmanuel Chemla, Ryan M. Nefdt

TL;DR

The paper tackles whether LLMs constitute general learners and what their behavior implies for human language acquisition. It argues that neither humans nor LLMs are general learners and introduces a dual-optimization view—training-time objective plus evolution-like selection—that shapes LLMs. By examining benchmarks, learning trajectories, and impossible-language experiments, the authors show that LLM performance cannot be straightforwardly used to settle debates about human biases or innate language faculties, since these systems are heavily engineered and selected. The work cautions against overextending cognitive-science inferences from LLMs, while outlining how these models can still inform our understanding of language learning when their developmental history and selection pressures are properly accounted for.

Abstract

What role can the otherwise successful Large Language Models (LLMs) play in the understanding of human cognition, and in particular in terms of informing language acquisition debates? To contribute to this question, we first argue that neither humans nor LLMs are general learners, in a variety of senses. We make a novel case for how in particular LLMs follow a dual-optimization process: they are optimized during their training (which is typically compared to language acquisition), and modern LLMs have also been selected, through a process akin to natural selection in a species. From this perspective, we argue that the performance of LLMs, whether similar or dissimilar to that of humans, does not weigh easily on important debates about the importance of human cognitive biases for language.

No Such Thing as a General Learner: Language models and their dual optimization

TL;DR

Abstract

Paper Structure (16 sections, 4 figures)

This paper contains 16 sections, 4 figures.

Introduction
What is a general learner?
Are LLMs general learners?
No such thing as a general learner
Vanilla is a flavour too
General as domain general
Are LLMs human-like? The role of benchmarks
The nature of the benchmarks
Benchmarks as implicit objective functions
Benchmarks about behavior vs benchmarks about learning
Behavior vs representations and computations
Are LLMs human-like? The case of impossible languages
The classic impossible language debate
LLMs and impossible languages
The secret language of LLMs (autoprompt)
...and 1 more sections

Figures (4)

Figure 1: LLMs have followed a long chain of optimization, which have made them increasingly specialized for language learning.
Figure 2: In large artificial networks, it is very possible that different tasks are treated by different sub-parts of the models, including largely independent parts. The existence of a single model to treat several tasks is therefore not an argument against some modularity between these tasks.
Figure 3: Several learning strategies and several computational models can lead to the same behavior, that is, a good fit of behavior still largely underspecifies a model of cognition.
Figure 4: From a given input, different learners will learn different languages. The typical remark is that a defective LLM may learn a subset of English (the left part will be missing). But this left part has been decreasing. Now we are also finding that LLMs also learn languages of their own. These languages could be languages that humans will never learn, even if given the relevant input. Or they could be extensions of the languages that humans would learn say with English, and these later cases are in fact not only attested and obviously very foreign to humans, but there existence is a necessary, mathematical consequence of the form of these models.

No Such Thing as a General Learner: Language models and their dual optimization

TL;DR

Abstract

No Such Thing as a General Learner: Language models and their dual optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)