Table of Contents
Fetching ...

Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible

Imry Ziv, Nur Lan, Emmanuel Chemla, Roni Katzir

TL;DR

This study investigates whether large language models (LLMs) inherently distinguish between humanly possible languages and their impossible perturbations. By extending previous English-focused analyses to eight languages and multiple perturbations, and training GPT-2 from scratch on baseline and perturbed datasets, the authors compare perplexity-based learning curves under intralinguistic and interlinguistic frameworks. They find that GPT-2 often learns possible and impossible variants with similar ease and show no systematic separation between attested and unattested language sets, challenging the notion that LLMs encode human-like innate linguistic biases. The results suggest that perplexity-based metrics may not capture the innate biases that shape human linguistic typology, highlighting a fundamental mismatch between LLM learning dynamics and human language cognition. Limitations include the scope of perturbations and model size, indicating the need for broader empirical coverage across architectures and languages.

Abstract

Are large language models (LLMs) sensitive to the distinction between humanly possible languages and humanly impossible languages? This question is taken by many to bear on whether LLMs and humans share the same innate learning biases. Previous work has attempted to answer it in the positive by comparing LLM learning curves on existing language datasets and on "impossible" datasets derived from them via various perturbation functions. Using the same methodology, we examine this claim on a wider set of languages and impossible perturbations. We find that in most cases, GPT-2 learns each language and its impossible counterpart equally easily, in contrast to previous claims. We also apply a more lenient condition by testing whether GPT-2 provides any kind of separation between the whole set of natural languages and the whole set of impossible languages. By considering cross-linguistic variance in various metrics computed on the perplexity curves, we show that GPT-2 provides no systematic separation between the possible and the impossible. Taken together, these perspectives show that LLMs do not share the human innate biases that shape linguistic typology.

Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible

TL;DR

This study investigates whether large language models (LLMs) inherently distinguish between humanly possible languages and their impossible perturbations. By extending previous English-focused analyses to eight languages and multiple perturbations, and training GPT-2 from scratch on baseline and perturbed datasets, the authors compare perplexity-based learning curves under intralinguistic and interlinguistic frameworks. They find that GPT-2 often learns possible and impossible variants with similar ease and show no systematic separation between attested and unattested language sets, challenging the notion that LLMs encode human-like innate linguistic biases. The results suggest that perplexity-based metrics may not capture the innate biases that shape human linguistic typology, highlighting a fundamental mismatch between LLM learning dynamics and human language cognition. Limitations include the scope of perturbations and model size, indicating the need for broader empirical coverage across architectures and languages.

Abstract

Are large language models (LLMs) sensitive to the distinction between humanly possible languages and humanly impossible languages? This question is taken by many to bear on whether LLMs and humans share the same innate learning biases. Previous work has attempted to answer it in the positive by comparing LLM learning curves on existing language datasets and on "impossible" datasets derived from them via various perturbation functions. Using the same methodology, we examine this claim on a wider set of languages and impossible perturbations. We find that in most cases, GPT-2 learns each language and its impossible counterpart equally easily, in contrast to previous claims. We also apply a more lenient condition by testing whether GPT-2 provides any kind of separation between the whole set of natural languages and the whole set of impossible languages. By considering cross-linguistic variance in various metrics computed on the perplexity curves, we show that GPT-2 provides no systematic separation between the possible and the impossible. Taken together, these perspectives show that LLMs do not share the human innate biases that shape linguistic typology.

Paper Structure

This paper contains 22 sections, 1 equation, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Learning curve of GPT-2 on a standard English dataset vs. a perturbed, impossible variant of the same language. The ease-of-learning methodology used in KalliniPapadimitriouFutrellMahowaldPotts:2024 fails to create a stable boundary between possible and impossible languages, often preferring the impossible variants as shown here. See Figure \ref{['fig:language-learning-curves']} for the full results.
  • Figure 2: Learning curves for attested languages and their perturbed (impossible) variants. Each subplot displays the learning curves for an experiment, with mean error values and corresponding heatmap hues overlaid. Positive values (green hues) indicate that the attested language's learning curve is, on average, above its perturbed variant, while negative values (pink hues) indicate the opposite.
  • Figure 3: Cross-linguistic comparison of minimal perplexity values during training and of area under the curve (AUC) of the training curves from Figure \ref{['fig:language-learning-curves']}. Languages compared to the same baseline have the same shape pattern: triangles are compared to NO PERTURB (blue triangle), stars to REVERSE baseline (blue star) and diamonds to HOP baseline (blue diamond). across-language and within-language variances are denoted below each subplot. Values vary considerably more across languages (different languages across same perturbation) than within languages (different perturbations across same language). This shows that attested languages and their perturbed, impossible variants pattern together.