Table of Contents
Fetching ...

TravelAgent: Generative Agents in the Built Environment

Ariel Noyman, Kai Hu, Kent Larson

TL;DR

TravelAgent introduces a web-based platform that integrates Generative Agents with Agent-Based Modeling to simulate human-like navigation and experience in diverse built environments using multimodal sensory inputs and Chain-of-Thought reasoning. The study conducts 100 experiments totaling 1,898 agent steps, achieving a 76% task-completion rate and revealing how agents perceive, memory, and adapt to urban spaces under varying conditions. Through spatial, term-frequency, topical, and sentiment analyses, the work demonstrates TA's potential to inform urban design, spatial cognition research, and agent-based simulation, while candidly addressing challenges in validation, diversity, complexity, and efficiency. Overall, TravelAgent offers a new paradigm for evaluating and refining spatial configurations from a human-centric, cognitive perspective, with clear pathways for extending realism and applicability in practice.

Abstract

Understanding human behavior in built environments is critical for designing functional, user centered urban spaces. Traditional approaches, such as manual observations, surveys, and simplified simulations, often fail to capture the complexity and dynamics of real world behavior. To address these limitations, we introduce TravelAgent, a novel simulation platform that models pedestrian navigation and activity patterns across diverse indoor and outdoor environments under varying contextual and environmental conditions. TravelAgent leverages generative agents integrated into 3D virtual environments, enabling agents to process multimodal sensory inputs and exhibit human-like decision-making, behavior, and adaptation. Through experiments, including navigation, wayfinding, and free exploration, we analyze data from 100 simulations comprising 1898 agent steps across diverse spatial layouts and agent archetypes, achieving an overall task completion rate of 76%. Using spatial, linguistic, and sentiment analyses, we show how agents perceive, adapt to, or struggle with their surroundings and assigned tasks. Our findings highlight the potential of TravelAgent as a tool for urban design, spatial cognition research, and agent-based modeling. We discuss key challenges and opportunities in deploying generative agents for the evaluation and refinement of spatial designs, proposing TravelAgent as a new paradigm for simulating and understanding human experiences in built environments.

TravelAgent: Generative Agents in the Built Environment

TL;DR

TravelAgent introduces a web-based platform that integrates Generative Agents with Agent-Based Modeling to simulate human-like navigation and experience in diverse built environments using multimodal sensory inputs and Chain-of-Thought reasoning. The study conducts 100 experiments totaling 1,898 agent steps, achieving a 76% task-completion rate and revealing how agents perceive, memory, and adapt to urban spaces under varying conditions. Through spatial, term-frequency, topical, and sentiment analyses, the work demonstrates TA's potential to inform urban design, spatial cognition research, and agent-based simulation, while candidly addressing challenges in validation, diversity, complexity, and efficiency. Overall, TravelAgent offers a new paradigm for evaluating and refining spatial configurations from a human-centric, cognitive perspective, with clear pathways for extending realism and applicability in practice.

Abstract

Understanding human behavior in built environments is critical for designing functional, user centered urban spaces. Traditional approaches, such as manual observations, surveys, and simplified simulations, often fail to capture the complexity and dynamics of real world behavior. To address these limitations, we introduce TravelAgent, a novel simulation platform that models pedestrian navigation and activity patterns across diverse indoor and outdoor environments under varying contextual and environmental conditions. TravelAgent leverages generative agents integrated into 3D virtual environments, enabling agents to process multimodal sensory inputs and exhibit human-like decision-making, behavior, and adaptation. Through experiments, including navigation, wayfinding, and free exploration, we analyze data from 100 simulations comprising 1898 agent steps across diverse spatial layouts and agent archetypes, achieving an overall task completion rate of 76%. Using spatial, linguistic, and sentiment analyses, we show how agents perceive, adapt to, or struggle with their surroundings and assigned tasks. Our findings highlight the potential of TravelAgent as a tool for urban design, spatial cognition research, and agent-based modeling. We discuss key challenges and opportunities in deploying generative agents for the evaluation and refinement of spatial designs, proposing TravelAgent as a new paradigm for simulating and understanding human experiences in built environments.

Paper Structure

This paper contains 29 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: A schematic representation of the TravelAgent system. (left) TAs are initialized with various parameters defining the agent persona and its environment. (center) At each step, the agent employs Chain-of-Thought (CoT) to process sensory inputs, plan actions, and make decisions. (right) The agent executes actions within the environment and updates its internal memory based on its experiences.
  • Figure 2: TravelAgent interface. The web app provides an end-to-end experimentation environment for testing and evaluating TravelAgents. (left) Initial settings and inputs provided to the agent; (bottom) Output log of the Chain-of-Thought process: [orange] are the agent's observations, [green] are the agent's memories, [purple] are the agent's plans, [blue] are the actions/decisions. (top) Panoramic street-level view of the environment. The rudimentary 3D environment is guiding an SDXL image generation model to create eye-level images, as well as to provide depth estimation, and collision information. Simulating in generative environments allows users to change scenarios, agents' profiles, and tasks by simply updating a short textual prompt, as shown on the right column.
  • Figure 3: Pedestrian-level image generation from a 3D model. The SDXL-Turbo image generation model (middle) uses a class or canny-guided diffusion model podell2023sdxl to generate realistic street-level images from the 3D model (left). The generated images then analyzed by the CoT to infer the environment (middle-right), objects, and spatial layout. The agent is provided with additional inference, such as depth estimation and collision warnings, to guide its decision-making process (right).
  • Figure 4: Visual Perception in 'Lunch Break' Experiment. The agent's visual perception is guided by Google Street View (GSV) images (left), which provide a first-person view of the environment. A Mask2Former model is used to segment the image and identify objects, such as buildings, trees, and benches. An OpenCV convex hull algorithm is used to estimate the segments outlines (middle), which are then used as a textual reference for the agent's navigation and decision-making process. (right) The agent's steps are visualized on the map, with decision points marked as circles, turning red as the agent progresses.
  • Figure 5: Spatial Analysis of 'Train Station' Experiment. For each scenario, we evaluate the agent's successful and failed paths. If the agent reaches and recognizes the subway station by declaring 'stop', the path is considered successful, and the agent is given a new subtask. The top left figure show both a successful and a failed path in the 'Base' scenario. The top row in the bottom figure shows the paths of all agents in all other scenarios, and the bottom row shows the spatial aggregation of decision points across all scenarios. Notably, the agents in the 'Night' scenario have the most failed and inconsistent paths, which is also reflected in more sparse decision point area. Conversely, the agents in the 'Winter' scenario have more consistent paths, with a clear aggregation of decision points early in the simulation.
  • ...and 3 more figures