Table of Contents
Fetching ...

Evolution and compression in LLMs: On the emergence of human-aligned categorization

Nathaniel Imel, Noga Zaslavsky

TL;DR

This work evaluates whether LLMs can develop human-aligned semantic categories through Information Bottleneck (IB) efficiency in color naming, a domain rich in human data. It combines two studies: a broad English color naming assessment across 39 models and Iterated in-Context Language Learning (IICLL) to probe inductive biases via simulated cultural transmission. The results show that while many models fail to match English color naming, the largest, instruction-tuned models (notably Gemini 2.0) achieve English alignment and near-IB-optimal tradeoffs, and, under IICLL, exhibit human-like inductive biases toward IB-efficiency with trajectories approaching the human IB frontier. A domain-general test with Shepard circles suggests that Gemini can form structured, perceptually anchored categories beyond color, indicating that IB-efficient categorization may generalize across domains without explicit IB training, with implications for human-AI alignment and robust semantic structuring.

Abstract

Converging evidence suggests that human systems of semantic categories achieve near-optimal compression via the Information Bottleneck (IB) complexity-accuracy tradeoff. Large language models (LLMs) are not trained for this objective, which raises the question: are LLMs capable of evolving efficient human-aligned semantic systems? To address this question, we focus on color categorization -- a key testbed of cognitive theories of categorization with uniquely rich human data -- and replicate with LLMs two influential human studies. First, we conduct an English color-naming study, showing that LLMs vary widely in their complexity and English-alignment, with larger instruction-tuned models achieving better alignment and IB-efficiency. Second, to test whether these LLMs simply mimic patterns in their training data or actually exhibit a human-like inductive bias toward IB-efficiency, we simulate cultural evolution of pseudo color-naming systems in LLMs via a method we refer to as Iterated in-Context Language Learning (IICLL). We find that akin to humans, LLMs iteratively restructure initially random systems towards greater IB-efficiency. However, only a model with strongest in-context capabilities (Gemini 2.0) is able to recapitulate the wide range of near-optimal IB-tradeoffs observed in humans, while other state-of-the-art models converge to low-complexity solutions. These findings demonstrate how human-aligned semantic categories can emerge in LLMs via the same fundamental principle that underlies semantic efficiency in humans.

Evolution and compression in LLMs: On the emergence of human-aligned categorization

TL;DR

This work evaluates whether LLMs can develop human-aligned semantic categories through Information Bottleneck (IB) efficiency in color naming, a domain rich in human data. It combines two studies: a broad English color naming assessment across 39 models and Iterated in-Context Language Learning (IICLL) to probe inductive biases via simulated cultural transmission. The results show that while many models fail to match English color naming, the largest, instruction-tuned models (notably Gemini 2.0) achieve English alignment and near-IB-optimal tradeoffs, and, under IICLL, exhibit human-like inductive biases toward IB-efficiency with trajectories approaching the human IB frontier. A domain-general test with Shepard circles suggests that Gemini can form structured, perceptually anchored categories beyond color, indicating that IB-efficient categorization may generalize across domains without explicit IB training, with implications for human-AI alignment and robust semantic structuring.

Abstract

Converging evidence suggests that human systems of semantic categories achieve near-optimal compression via the Information Bottleneck (IB) complexity-accuracy tradeoff. Large language models (LLMs) are not trained for this objective, which raises the question: are LLMs capable of evolving efficient human-aligned semantic systems? To address this question, we focus on color categorization -- a key testbed of cognitive theories of categorization with uniquely rich human data -- and replicate with LLMs two influential human studies. First, we conduct an English color-naming study, showing that LLMs vary widely in their complexity and English-alignment, with larger instruction-tuned models achieving better alignment and IB-efficiency. Second, to test whether these LLMs simply mimic patterns in their training data or actually exhibit a human-like inductive bias toward IB-efficiency, we simulate cultural evolution of pseudo color-naming systems in LLMs via a method we refer to as Iterated in-Context Language Learning (IICLL). We find that akin to humans, LLMs iteratively restructure initially random systems towards greater IB-efficiency. However, only a model with strongest in-context capabilities (Gemini 2.0) is able to recapitulate the wide range of near-optimal IB-tradeoffs observed in humans, while other state-of-the-art models converge to low-complexity solutions. These findings demonstrate how human-aligned semantic categories can emerge in LLMs via the same fundamental principle that underlies semantic efficiency in humans.

Paper Structure

This paper contains 37 sections, 4 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: (a) The standard WCS color naming grid Kay2009World. (b) Color naming task with humans and LLMs. Multi-modal LLMs can observe colors either via text or images. (c) Illustration of the IICLL paradigm. At each generation $t$, an LLM is prompted with a small dataset for ICL, $d_{t-1}$, consisting of pairs of colors and pseudo labels sampled from the previous generation's language, $L_{t-1}$. With these data in context, the LLM performs the naming task for the full space (a).
  • Figure 2: English color naming experiment with LLMs.(a) IB complexity-accuracy tradeoffs achieved by instruction-tuned LLMs (see Appendix \ref{['app:naming_all']} for all/base models), plotted with respect to the English tradeoff (blue star) and IB theoretical bound (black curve) from Zaslavsky2018Efficient. Models vary widely in their tradeoffs, with larger instruction-tuned models reaching the English point. (b) Color naming systems of English (from Lindsey2014Color) and best-performing LLMs. Each system is shown by its mode map, i.e., it is plotted against the WCS grid (\ref{['fig:stagesetting']}a), where each chip is colored by the color-centroid of its modal category. (c) English-alignment (top) and IB complexity (bottom) of all LLMs. Markers are the same as in (a), where a black edge indicates the instruction-tuned model and no edge indicates the base model. Across model families, size and instruction-tuning are associated with higher complexity and better alignment to English.
  • Figure 3: IICLL with LLMs converges to near-optimal IB solutions. The trajectories of Gemini 2.0 Flash (upper left), Gemma 3 27B (upper right), Llama 3.3 70B (lower left) and Qwen 2.5 32B (lower right) are plotted on the information plane (same as \ref{['fig:naming']}A), together with the IB tradeoffs across human languages (WCS+English) and human IL data. Small black dots correspond to random initializations of chains with varying number of categories, $k\in\{2,3,4,5,6,14\}$. Thin blue lines correspond to the LLMs' IICLL trajectories. Gemini captures the complexity range observed across human languages, while the other models converge to lower complexity systems. All models are instruction-tuned.
  • Figure 4: Across IICLL generations, emergent LLM systems become more efficient (a), more aligned with the optimal IB systems (b), and more aligned with human languages (c). Colored curves show the average across initializations and conditions, and the colored regions corresponds to the 95% confidence intervals.
  • Figure 5: (a) The Shepard circles stimulus grid, (b) Gemini IICLL chains for naming Shepard circles. Rows correspond to individual chains, initialized randomly. Each system is plotted over the stimulus grid, where colors correspond to unique labels.
  • ...and 12 more figures