Table of Contents
Fetching ...

Child vs. machine language learning: Can the logical structure of human language unleash LLMs?

Uli Sauerland, Celia Matthaei, Felix Salfner

TL;DR

The paper argues that human language learning is guided by intrinsic logical structures, which contrast with current LLM training that lacks bias toward algebraic relationships like negation. It predicts that certain language properties, such as German default plural formation, will reveal suboptimal generalization in LLMs when these logical connections are not explicitly encoded. Through a nonce-noun plural task adapted from Marcus et al., it demonstrates that seven LLMs largely fail to generalize the default plural $-s$ in German, highlighting systematic gaps in how LLMs capture morpho-logic. The authors advocate cross-disciplinary efforts and curriculum-like training strategies to inject linguistic logic into LLMs, suggesting that such approaches could improve generalization, efficiency, and linguistic competence in AI systems.

Abstract

We argue that human language learning proceeds in a manner that is different in nature from current approaches to training LLMs, predicting a difference in learning biases. We then present evidence from German plural formation by LLMs that confirm our hypothesis that even very powerful implementations produce results that miss aspects of the logic inherent to language that humans have no problem with. We conclude that attention to the different structures of human language and artificial neural networks is likely to be an avenue to improve LLM performance.

Child vs. machine language learning: Can the logical structure of human language unleash LLMs?

TL;DR

The paper argues that human language learning is guided by intrinsic logical structures, which contrast with current LLM training that lacks bias toward algebraic relationships like negation. It predicts that certain language properties, such as German default plural formation, will reveal suboptimal generalization in LLMs when these logical connections are not explicitly encoded. Through a nonce-noun plural task adapted from Marcus et al., it demonstrates that seven LLMs largely fail to generalize the default plural in German, highlighting systematic gaps in how LLMs capture morpho-logic. The authors advocate cross-disciplinary efforts and curriculum-like training strategies to inject linguistic logic into LLMs, suggesting that such approaches could improve generalization, efficiency, and linguistic competence in AI systems.

Abstract

We argue that human language learning proceeds in a manner that is different in nature from current approaches to training LLMs, predicting a difference in learning biases. We then present evidence from German plural formation by LLMs that confirm our hypothesis that even very powerful implementations produce results that miss aspects of the logic inherent to language that humans have no problem with. We conclude that attention to the different structures of human language and artificial neural networks is likely to be an avenue to improve LLM performance.

Paper Structure

This paper contains 4 sections, 4 figures.

Figures (4)

  • Figure 1: Implementation of two ways of adding the negation of A in a neural network. In panel a, neuron a yields output A. Panel b and c illustrate two possible ways of obtaining output Ā whenever A is not outputted. In panel b, an independent neuron ā with inverse weights output Ā. In c, two new neurons = and $\mathbf{\neg}$ trigger A and Ā depending on the output of unit a.
  • Figure 2: Schematic view of the German plural endings: The majority of noun stems select one of the four irregular plural endings -e, -n, -r, and -$\emptyset$, while the default ending -s is rare.
  • Figure 3: Mean plausibility ratings (MR) of the plural forms produced by LLMs for the German nonce noun task (Marcus et al. 1995) grouped by nouns rhyming with a real German noun and non-rhyming nonce nouns. The rightmost two bars show the human plausibility ratings of the correct plural forms
  • Figure 4: Item analysis of the nonce-noun plural task with rating with plausibility threshold 3.0. Stems on the left (Bral to Bnaupf) are rhyming while stems on the right are non-rhyming.