Table of Contents
Fetching ...

Large Language Models as Proxies for Theories of Human Linguistic Cognition

Imry Ziv, Nur Lan, Emmanuel Chemla, Roni Katzir

TL;DR

The paper investigates whether current large language models can function as proxies for relatively linguistically-neutral theories of human linguistic cognition (HLC), contrasting the LLM Theory with the Proxy View and a concrete neutral framework $H_3$. Through two lines of inquiry—alignment with the stimulus via ATB/PG/TTE tests and cross-linguistic typology via perturbations across multiple languages—the authors find that LLMs generally fail to acquire key linguistic patterns and sometimes even predict easier learning for typologically unattested variants. They argue that these results provide limited support for linguistically-neutral theories and offer a pragmatic critique of the Proxy View, emphasizing the need for explicit theories and rigorous, detail-oriented evaluation to make LLMs scientifically informative for HLC. The work highlights the boundaries of current LLMs as tools for cognitive linguistics and calls for deeper theoretical specification and methodological precision in future proxy-based analyses.

Abstract

We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the context of two kinds of questions: (a) whether the target theory accounts for the acquisition of a given pattern from a given corpus; and (b) whether the target theory makes a given typologically-attested pattern easier to acquire than another, typologically-unattested pattern. For each of the two questions we show, building on recent literature, how current LLMs can potentially be of help, but we note that at present this help is quite limited.

Large Language Models as Proxies for Theories of Human Linguistic Cognition

TL;DR

The paper investigates whether current large language models can function as proxies for relatively linguistically-neutral theories of human linguistic cognition (HLC), contrasting the LLM Theory with the Proxy View and a concrete neutral framework . Through two lines of inquiry—alignment with the stimulus via ATB/PG/TTE tests and cross-linguistic typology via perturbations across multiple languages—the authors find that LLMs generally fail to acquire key linguistic patterns and sometimes even predict easier learning for typologically unattested variants. They argue that these results provide limited support for linguistically-neutral theories and offer a pragmatic critique of the Proxy View, emphasizing the need for explicit theories and rigorous, detail-oriented evaluation to make LLMs scientifically informative for HLC. The work highlights the boundaries of current LLMs as tools for cognitive linguistics and calls for deeper theoretical specification and methodological precision in future proxy-based analyses.

Abstract

We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the context of two kinds of questions: (a) whether the target theory accounts for the acquisition of a given pattern from a given corpus; and (b) whether the target theory makes a given typologically-attested pattern easier to acquire than another, typologically-unattested pattern. For each of the two questions we show, building on recent literature, how current LLMs can potentially be of help, but we note that at present this help is quite limited.

Paper Structure

This paper contains 22 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Model accuracy on ATB and PG datasets averaged over five experiment seeds. Accuracy is measured as the ratio of cases where the model assigns a higher probability to the grammatical sentence continuation.
  • Figure 2: Model accuracy values for the $P(+that, +trace) < P(+that, -trace)$ criterion (top) and $P(+that, +trace) < P(-that, +trace)$ criterion (bottom), over samples of 10,000 sentence pairs from the TTE test set. Results are averaged over five seeds. The dark plot represents model training sizes.
  • Figure 3: Validation perplexity during training for English, Italian, and Russian and their partial-reverse perturbations. The results indicate that $\Pi(\textit{attested}) < \Pi(\textit{partial-reverse})$.
  • Figure 4: Validation perplexity during training for attested (baseline) and full-reverse versions of English, Russian, Italian, and Hebrew.
  • Figure 5: Validation perplexity during training for the attested (baseline) and switch-indices versions of English, Italian, Hebrew and Russian.
  • ...and 1 more figures