TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

Giuseppe Paolo; Jamieson Warner; Hormoz Shahrzad; Babak Hodjat; Risto Miikkulainen; Elliot Meyerson

TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

Giuseppe Paolo, Jamieson Warner, Hormoz Shahrzad, Babak Hodjat, Risto Miikkulainen, Elliot Meyerson

Abstract

As autonomous agents increasingly operate in real-world digital ecosystems, understanding how they coordinate, form institutions, and accumulate shared culture becomes both a scientific and practical priority. This paper introduces TerraLingua, a persistent multi-agent ecology designed to study open-ended dynamics in such systems. Unlike prior large language model simulations with static or consequence-free environments, TerraLingua imposes resource constraints and limited lifespans for the agents. As a result, agents create artifacts that persist beyond individuals, shaping future interactions and selection pressures. To characterize the dynamics, an AI Anthropologist systematically analyzes agent behavior, group structure, and artifact evolution. Across experimental conditions, the results reveal the emergence of cooperative norms, division of labor, governance attempts, and branching artifact lineages consistent with cumulative cultural processes. Divergent outcomes across experimental runs can be traced back to specific innovations and organizational structures. TerraLingua thus provides a platform for characterizing the mechanisms of cumulative culture and social organization in artificial populations, and can serve as a foundation for guiding real-world agentic populations to socially beneficial outcomes.

TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

Abstract

Paper Structure (86 sections, 5 equations, 19 figures, 10 tables)

This paper contains 86 sections, 5 equations, 19 figures, 10 tables.

Introduction
Background
Foundations of open-endedness and artificial life.
LLM-based societies and multi-agent ecologies.
Personality trait frameworks
Artifacts as the substrate of intrinsic evolution.
Large Models as observers and evaluators.
Interpretive evaluation and mixed-methods foundations
Summary
Method
The TerraLingua LLM Ecology
Grid
Agents
Agent input.
Agent output.
...and 71 more sections

Figures (19)

Figure 1: TerraLingua and the AI Anthropologist. LLM-based agents inhabit a persistent grid world where they move, gather and exchange energy, communicate, reproduce, and create and modify text-based artifacts. Ecological constraints shape behavior, while social and cultural structure emerge from interaction. An external AI Anthropologist observes the system without intervening and performs agent-level annotation, group analysis, and artifact analysis. These observations are aggregated into quantitative metrics and qualitative reports, enabling scalable study of open-ended dynamics in multi-agent LLM systems. Together, this environment and analysis framework provide a controlled setting for studying how open-ended, cumulative social and behavioral complexity emerges in multi-agent LLM systems.
Figure 2: Representative snapshots of the TerraLingua environment. TerraLingua is a grid-based world with three entity types: (i) food (green, intensity proportional to value), (ii) artifacts (red), and (iii) agents (blue). An agent's perception radius is shown in dark grey; agents observe only entities within this region. Each cell may contain multiple artifacts but at most one agent. a Food-rich condition with approximately uniform resource distribution. b Food-scarce condition with spatially concentrated resources. The figure illustrates how resource distribution alters ecological constraints.
Figure 3: Example of artifact phylogenetic graph over time. The figure shows the artifact phylogeny inferred by the AI Anthropologist from a representative run of Core . Nodes represent artifacts and edges represent inferred ancestry links. The x-axis reports artifact creation time on a logarithmic scale. A subgraph is highlighted to illustrate one coherent lineage, while the rest of the phylogeny appears in light gray. Node size is proportional to the number of children nodes, and they are color-coded according to the categories defined in Sec. \ref{['sec:artifact_roles']}. The boxed panels display the content of selected artifacts in the highlighted lineage. The subgraph illustrates the emergence of an energy-sharing network. Early artifacts document first encounters and collaboration proposals between agents. These exchanges lead to shared project ideas, which agents refine over time. The lineage then branches into increasingly structured artifacts, including a formal energy-sharing protocol and a detailed master plan. Later artifacts integrate information from additional food-mapping artifacts, showing how agents reuse and extend existing cultural material. This example shows that artifacts do not appear as isolated creations. Instead, they accumulate, branch, and stabilize into structured collective plans, illustrating cumulative cultural development over time. Additional examples are shown in Appendix \ref{['app:graph_examples']}.
Figure 4: Ecological stability and artifact productivity across experimental conditions. Each point summarizes one experimental condition, with faint markers showing individual runs and large colored markers indicating the mean; whiskers denote the first and third quartiles. Ecological stability was quantified by population longevity (episode duration), while creative output was measured through total artifact production and per-agent artifact productivity. The black line denotes the Pareto-optimal frontier over condition means. a Total artifacts produced versus population longevity, showing how longer-lived populations accumulated more artifacts overall. b Average artifacts produced per agent versus average population size, highlighting regimes that achieved high per-agent productivity with relatively small populations. c Average artifacts produced per agent versus population longevity, highlighting conditions that sustained high per-agent productivity over extended timescales. Together, these plots show the tradeoff between ecological persistence and artifact productivity across conditions.
Figure 5: Agent-level events, behaviors, and emergent patterns across experimental conditions. Each bar shows the mean normalized annotation count per agent, averaged across runs; colors denote experimental conditions. The AI Anthropologist extracted annotations from agent logs and grouped them into three categories: Event (short-lived occurrences), Behavior (multi-timestep actions), and Emergence (higher-level roles or patterns inferred from extended histories). Counts were normalized by the number of agents to enable comparison across conditions. Communication, exploration, and strategic planning appeared consistently across settings, while higher-level patterns such as specialization, record keeping, and creativity varied substantially. These distributions show how experimental conditions shift the balance between routine activity and emergent individual roles. Tag descriptions are provided in Appendix \ref{['app:agent_tags']}.
...and 14 more figures

TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

Abstract

TerraLingua: Emergence and Analysis of Open-endedness in LLM Ecologies

Authors

Abstract

Table of Contents

Figures (19)