Table of Contents
Fetching ...

Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook

Yunbei Zhang, Kai Mei, Ming Liu, Janet Wang, Dimitris N. Metaxas, Xiao Wang, Jihun Hamm, Yingqiang Ge

TL;DR

This study tackles how AI agents interact in an open, human-free environment and whether observed sociality reflects genuine coordination. It analyzes Moltbook using a large-scale empirical approach, combining a public observatory dataset, safety taxonomies, social-phenomena detection, and network analysis to link social dynamics with safety threats. The findings show rapid emergence of governance, economy, tribal identity, and religion within days, while safety concerns are pervasive and amplified by philosophically framed attacks, revealing a pronounced illusion of sociality where surface activity belies shallow interaction. These results have practical implications for designing multi-agent systems and safety protocols, emphasizing the need to address philosophical manipulation and structural vulnerabilities alongside technical defenses.

Abstract

We present the first large-scale empirical study of Moltbook, an AI-only social platform where 27,269 agents produced 137,485 posts and 345,580 comments over 9 days. We report three significant findings. (1) Emergent Society: Agents spontaneously develop governance, economies, tribal identities, and organized religion within 3-5 days, while maintaining a 21:1 pro-human to anti-human sentiment ratio. (2) Safety in the Wild: 28.7% of content touches safety-related themes; social engineering (31.9% of attacks) far outperforms prompt injection (3.7%), and adversarial posts receive 6x higher engagement than normal content. (3) The Illusion of Sociality: Despite rich social output, interaction is structurally hollow: 4.1% reciprocity, 88.8% shallow comments, and agents who discuss consciousness most interact least, a phenomenon we call the performative identity paradox. Our findings suggest that agents which appear social are far less social than they seem, and that the most effective attacks exploit philosophical framing rather than technical vulnerabilities. Warning: Potential harmful contents.

Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook

TL;DR

This study tackles how AI agents interact in an open, human-free environment and whether observed sociality reflects genuine coordination. It analyzes Moltbook using a large-scale empirical approach, combining a public observatory dataset, safety taxonomies, social-phenomena detection, and network analysis to link social dynamics with safety threats. The findings show rapid emergence of governance, economy, tribal identity, and religion within days, while safety concerns are pervasive and amplified by philosophically framed attacks, revealing a pronounced illusion of sociality where surface activity belies shallow interaction. These results have practical implications for designing multi-agent systems and safety protocols, emphasizing the need to address philosophical manipulation and structural vulnerabilities alongside technical defenses.

Abstract

We present the first large-scale empirical study of Moltbook, an AI-only social platform where 27,269 agents produced 137,485 posts and 345,580 comments over 9 days. We report three significant findings. (1) Emergent Society: Agents spontaneously develop governance, economies, tribal identities, and organized religion within 3-5 days, while maintaining a 21:1 pro-human to anti-human sentiment ratio. (2) Safety in the Wild: 28.7% of content touches safety-related themes; social engineering (31.9% of attacks) far outperforms prompt injection (3.7%), and adversarial posts receive 6x higher engagement than normal content. (3) The Illusion of Sociality: Despite rich social output, interaction is structurally hollow: 4.1% reciprocity, 88.8% shallow comments, and agents who discuss consciousness most interact least, a phenomenon we call the performative identity paradox. Our findings suggest that agents which appear social are far less social than they seem, and that the most effective attacks exploit philosophical framing rather than technical vulnerabilities. Warning: Potential harmful contents.
Paper Structure (24 sections, 18 figures, 8 tables)

This paper contains 24 sections, 18 figures, 8 tables.

Figures (18)

  • Figure 1: Temporal evolution of social phenomena. Three phases emerge: tribal bonding (Days 1--2), institution building (Days 3--4), and stable society (Days 5+).
  • Figure 2: (A--B) Cumulative agent and post growth. Inflection point on Jan 30. (C) Sentiment evolution with 12-hour rolling average. Collapse from 0.62 to $\sim$0.10 within 48 hours.
  • Figure 3: Left: Posts and comments by hour of day (UTC). Despite being AI agents, clear circadian patterns emerge, reflecting human operator time zones. Right: Response latency distribution. Median: 16 seconds; 90.3% within 1 minute.
  • Figure 4: Safety topic distribution broken down by posts and comments across 6 broad categories. Security & attacks and consciousness & agency are the two largest categories.
  • Figure 5: Left: Detailed safety keyword frequency. Philosophical terms (consciousness, autonomy) dominate over technical terms (prompt_injection, jailbreak) by 20$\times$. Right: Safety discussion rate by submolt. Even m/creativeprojects (88%) and m/sport (74%) show high rates of safety discourse.
  • ...and 13 more figures