Emergent Word Order Universals from Cognitively-Motivated Language Models

Tatsuki Kuribayashi; Ryo Ueda; Ryo Yoshida; Yohei Oseki; Ted Briscoe; Timothy Baldwin

Emergent Word Order Universals from Cognitively-Motivated Language Models

Tatsuki Kuribayashi, Ryo Ueda, Ryo Yoshida, Yohei Oseki, Ted Briscoe, Timothy Baldwin

TL;DR

The paper investigates whether cognitively plausible biases in language models can explain emergent word-order universals observed across languages. By training a range of standard and cognitively-motivated LMs on artificial languages with 64 word-order configurations derived from six binary parameters, the authors relate typological frequency to model-driven processing costs via perplexity. They find that syntactic biases, left-corner parsing strategies, and memory limitations generally yield higher alignment with attested word-order distributions than standard models, with global and local correlations supporting the link between cognitive biases and typology. However, certain human-like preferences, such as agent-first (SOV) tendencies, remain only partially explained, indicating the need for additional factors beyond surprisal and pointing to the usefulness and limits of cognitively-motivated LMs for modeling language universals. The work demonstrates a principled way to connect processing efficiency, predictability, and word-order typology, while outlining paths to refine artificial data and extend the framework to capture more human-like biases.

Abstract

The world's languages exhibit certain so-called typological or implicational universals; for example, Subject-Object-Verb (SOV) languages typically use postpositions. Explaining the source of such biases is a key goal of linguistics. We study word-order universals through a computational simulation with language models (LMs). Our experiments show that typologically-typical word orders tend to have lower perplexity estimated by LMs with cognitively plausible biases: syntactic biases, specific parsing strategies, and memory limitations. This suggests that the interplay of cognitive biases and predictability (perplexity) can explain many aspects of word-order universals. It also showcases the advantage of cognitively-motivated LMs, typically employed in cognitive modeling, in the simulation of language universals.

Emergent Word Order Universals from Cognitively-Motivated Language Models

TL;DR

Abstract

Paper Structure (62 sections, 8 equations, 8 figures, 17 tables)

This paper contains 62 sections, 8 equations, 8 figures, 17 tables.

Introduction
Related research
Impossible languages and LMs
The Chomsky hierarchy and LMs
Word order preferences of LMs
Cognitively-motivated LMs
Preliminary
Need for computational simulation
Likning hypothesis
Assumption 1:
Assumption 2:
Problem settings
Word-order configurations
Frequencies of word order
Processing costs of word order
...and 47 more sections

Figures (8)

Figure 1: We compare the word orders that are challenging for LMs to those that are infrequent in attested languages (\ref{['sec:design']}). We examine the advantage of cognitively-motivated LMs (\ref{['sec:model']}) in simulating the word-order universals (the world's word-order distribution) with their inductive biases (\ref{['sec:experiment']}).
Figure 2: The frequency distribution of $2^6=64$ word-order configurations within attested languages (blue points) sorted in descending order. Suppose particular LMs A/B prefer word order as green/red points. The LM A (green points) is considered to have typologically more aligned inductive bias than the LM B (red points).
Figure 3: The results of global/local correlations. Each point corresponds to each run. Their colors and shapes denote the syntactic bias of the models. The TD and LC variants in the Transformer, LSTM, SRN, and N-gram settings correspond to the respective PLMs. The box presents the lower/upper quartiles.
Figure 4: Mean and standard deviation of local correlations with different linking functions: PPL of order $k$ and logarithmic PPL
Figure 5: Illustration of the relationship between predictability (y-axis) and word order frequency in each of the four base-order groups (SOV, SVO, OVS, and VOS). Each circle corresponds to each word order; larger ones are frequent word orders. Predictability is the negative PPL converted through the min-max normalization; thus higher predictability indicates lower PPL. The results are from the 3-gram PLM with the LC strategy.
...and 3 more figures

Emergent Word Order Universals from Cognitively-Motivated Language Models

TL;DR

Abstract

Emergent Word Order Universals from Cognitively-Motivated Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)