Evolution and compression in LLMs: On the emergence of human-aligned categorization
Nathaniel Imel, Noga Zaslavsky
TL;DR
This work evaluates whether LLMs can develop human-aligned semantic categories through Information Bottleneck (IB) efficiency in color naming, a domain rich in human data. It combines two studies: a broad English color naming assessment across 39 models and Iterated in-Context Language Learning (IICLL) to probe inductive biases via simulated cultural transmission. The results show that while many models fail to match English color naming, the largest, instruction-tuned models (notably Gemini 2.0) achieve English alignment and near-IB-optimal tradeoffs, and, under IICLL, exhibit human-like inductive biases toward IB-efficiency with trajectories approaching the human IB frontier. A domain-general test with Shepard circles suggests that Gemini can form structured, perceptually anchored categories beyond color, indicating that IB-efficient categorization may generalize across domains without explicit IB training, with implications for human-AI alignment and robust semantic structuring.
Abstract
Converging evidence suggests that human systems of semantic categories achieve near-optimal compression via the Information Bottleneck (IB) complexity-accuracy tradeoff. Large language models (LLMs) are not trained for this objective, which raises the question: are LLMs capable of evolving efficient human-aligned semantic systems? To address this question, we focus on color categorization -- a key testbed of cognitive theories of categorization with uniquely rich human data -- and replicate with LLMs two influential human studies. First, we conduct an English color-naming study, showing that LLMs vary widely in their complexity and English-alignment, with larger instruction-tuned models achieving better alignment and IB-efficiency. Second, to test whether these LLMs simply mimic patterns in their training data or actually exhibit a human-like inductive bias toward IB-efficiency, we simulate cultural evolution of pseudo color-naming systems in LLMs via a method we refer to as Iterated in-Context Language Learning (IICLL). We find that akin to humans, LLMs iteratively restructure initially random systems towards greater IB-efficiency. However, only a model with strongest in-context capabilities (Gemini 2.0) is able to recapitulate the wide range of near-optimal IB-tradeoffs observed in humans, while other state-of-the-art models converge to low-complexity solutions. These findings demonstrate how human-aligned semantic categories can emerge in LLMs via the same fundamental principle that underlies semantic efficiency in humans.
