Table of Contents
Fetching ...

Large Language Models and Impossible Language Acquisition: "False Promise" or an Overturn of our Current Perspective towards AI

Ziyan wang, Longlong Ma

TL;DR

This paper interrogates Chomsky's claims that Large Language Models (LLMs) cannot acquire language without innate, a priori structures by conducting controlled experiments that transform English into two impossible language variants. It contrasts GPT-2 small and LSTM architectures across small and large datasets, measuring cross-entropy loss and perplexity to assess learnability of possible versus impossible languages. The results show GPT-2 small models exhibit a bias toward natural-language-like learning (lower loss and perplexity on possible languages), while LSTMs align more with Chomskian predictions, underscoring architecture as a critical factor. The authors advocate a functional-empiricist paradigm that emphasizes empirical testing and observable behavior over metaphysical debates, proposing a shift toward functionalism and empiricism in AI language research with roots in Piaget and Ryle.

Abstract

In Chomsky's provocative critique "The False Promise of CHATGPT," Large Language Models (LLMs) are characterized as mere pattern predictors that do not acquire languages via intrinsic causal and self-correction structures like humans, therefore are not able to distinguish impossible languages. It stands as a representative in a fundamental challenge to the intellectual foundations of AI, for it integrally synthesizes major issues in methodologies within LLMs and possesses an iconic a priori rationalist perspective. We examine this famous critic from both the perspective in pre-existing literature of linguistics and psychology as well as a research based on an experiment inquiring the capacity of learning both possible and impossible languages among LLMs. We constructed a set of syntactically impossible languages by applying certain transformations to English. These include reversing whole sentences, and adding negation based on word-count parity. Two rounds of controlled experiments were each conducted on GPT-2 small models and long short-term memory (LSTM) models. Statistical analysis (Welch's t-test) shows GPT2 small models underperform in learning all of the impossible languages compared to their performance on the possible language (p<.001). On the other hand, LSTM models' performance tallies with Chomsky's argument, suggesting the irreplaceable role of the evolution of transformer architecture. Based on theoretical analysis and empirical findings, we propose a new vision within Chomsky's theory towards LLMs, and a shift of theoretical paradigm outside Chomsky, from his "rationalist-romantics" paradigm to functionalism and empiricism in LLMs research.

Large Language Models and Impossible Language Acquisition: "False Promise" or an Overturn of our Current Perspective towards AI

TL;DR

This paper interrogates Chomsky's claims that Large Language Models (LLMs) cannot acquire language without innate, a priori structures by conducting controlled experiments that transform English into two impossible language variants. It contrasts GPT-2 small and LSTM architectures across small and large datasets, measuring cross-entropy loss and perplexity to assess learnability of possible versus impossible languages. The results show GPT-2 small models exhibit a bias toward natural-language-like learning (lower loss and perplexity on possible languages), while LSTMs align more with Chomskian predictions, underscoring architecture as a critical factor. The authors advocate a functional-empiricist paradigm that emphasizes empirical testing and observable behavior over metaphysical debates, proposing a shift toward functionalism and empiricism in AI language research with roots in Piaget and Ryle.

Abstract

In Chomsky's provocative critique "The False Promise of CHATGPT," Large Language Models (LLMs) are characterized as mere pattern predictors that do not acquire languages via intrinsic causal and self-correction structures like humans, therefore are not able to distinguish impossible languages. It stands as a representative in a fundamental challenge to the intellectual foundations of AI, for it integrally synthesizes major issues in methodologies within LLMs and possesses an iconic a priori rationalist perspective. We examine this famous critic from both the perspective in pre-existing literature of linguistics and psychology as well as a research based on an experiment inquiring the capacity of learning both possible and impossible languages among LLMs. We constructed a set of syntactically impossible languages by applying certain transformations to English. These include reversing whole sentences, and adding negation based on word-count parity. Two rounds of controlled experiments were each conducted on GPT-2 small models and long short-term memory (LSTM) models. Statistical analysis (Welch's t-test) shows GPT2 small models underperform in learning all of the impossible languages compared to their performance on the possible language (p<.001). On the other hand, LSTM models' performance tallies with Chomsky's argument, suggesting the irreplaceable role of the evolution of transformer architecture. Based on theoretical analysis and empirical findings, we propose a new vision within Chomsky's theory towards LLMs, and a shift of theoretical paradigm outside Chomsky, from his "rationalist-romantics" paradigm to functionalism and empiricism in LLMs research.
Paper Structure (43 sections, 4 equations, 21 figures)

This paper contains 43 sections, 4 equations, 21 figures.

Figures (21)

  • Figure 1: Natural language operates on hierarchical tree structures
  • Figure 2: Impossible languages rely on rigid linear positioning
  • Figure 3: Loss value comparison and T test in experiment 1
  • Figure 4: Overall loss value comparison in experiment 2
  • Figure 5: T test in loss value for experiment 2
  • ...and 16 more figures