Gen-C: Populating Virtual Worlds with Generative Crowds

Andreas Panayiotou; Panayiotis Charalambous; Ioannis Karamouzas

Gen-C: Populating Virtual Worlds with Generative Crowds

Andreas Panayiotou, Panayiotis Charalambous, Ioannis Karamouzas

TL;DR

Gen-C presents a framework to automatically populate virtual worlds with high-level, multi-agent crowd behaviors by learning from synthetic data bootstrapped with large language models. It introduces a time-expanded crowd scenario graph and a pair of text-conditioned variational graph autoencoders that jointly learn structure and node features, enabling scalable generation of coherent, environment-aware crowds. The approach is validated on University Campus and Train Station scenarios, demonstrating diverse, plausible interactions and strong reconstruction fidelity, with qualitative renderings in Unity. By reducing dependence on real annotations and enabling text-guided generation, Gen-C offers a practical path to scalable, semantically rich crowd simulations for games, VR, and film production.

Abstract

Over the past two decades, researchers have made significant steps in simulating agent-based human crowds, yet most efforts remain focused on low-level tasks such as collision avoidance, path following, and flocking. Realistic simulations, however, require modeling high-level behaviors that emerge from agents interacting with each other and with their environment over time. We introduce Generative Crowds (Gen-C), a generative framework that produces crowd scenarios capturing agent-agent and agent-environment interactions, shaping coherent high-level crowd plans. To avoid the labor-intensive process of collecting and annotating real crowd video data, we leverage large language models (LLMs) to bootstrap synthetic datasets of crowd scenarios. We propose a time-expanded graph representation, encoding actions, interactions, and spatial context. Gen-C employs a dual Variational Graph Autoencoder (VGAE) architecture that jointly learns connectivity patterns and node features conditioned on textual and structural signals, overcoming the limitations of direct LLM generation to enable scalable, environment-aware multi-agent crowd simulations. We demonstrate the effectiveness of Gen-C on scenarios with diverse behaviors such as a University Campus and a Train Station, showing that it generates heterogeneous crowds, coherent interactions, and high-level decision-making patterns consistent with real-world crowd dynamics.

Gen-C: Populating Virtual Worlds with Generative Crowds

TL;DR

Abstract

Gen-C: Populating Virtual Worlds with Generative Crowds

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)