Table of Contents
Fetching ...

Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions

Stéphane Aroca-Ouellette, Ian Berlot-Attwell, Panagiotis Lymperopoulos, Abhiramon Rajasekharan, Tongqi Zhu, Herin Kang, Kaheer Suleman, Sam Pasupalak

TL;DR

MAPs presents a unified, scalable testbed for holistic decision-making by simulating open-ended business management with long-horizon planning, active environment learning, spatial reasoning, and stochasticity. It benchmarks humans against state-of-the-art LLM-based agents across easy and medium modes, exposing substantial gaps in current AI capabilities. The results show systematic weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and robust performance under uncertainty, and reveal that sandbox exploration and oracle world models can mitigate some but not all of these gaps. Taken together, MAPs provides a concrete platform to drive progress toward robust, adaptable agents capable of real-world business decision-making.

Abstract

Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experience, planning over long horizons in stochastic settings, and reasoning over spatial information. Yet existing human--AI benchmarks isolate subsets of these capabilities, limiting our ability to assess holistic decision-making competence. We introduce Mini Amusement Parks (MAPs), an amusement-park simulator designed to evaluate an agent's ability to model its environment, anticipate long-term consequences under uncertainty, and strategically operate a complex business. We provide human baselines and a comprehensive evaluation of state-of-the-art LLM agents, finding that humans outperform these systems by 6.5x on easy mode and 9.8x on medium mode. Our analysis reveals persistent weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and world modelling. By unifying these challenges within a single environment, MAPs offers a new foundation for benchmarking agents capable of adaptable decision making. Code: https://github.com/Skyfall-Research/MAPs

Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions

TL;DR

MAPs presents a unified, scalable testbed for holistic decision-making by simulating open-ended business management with long-horizon planning, active environment learning, spatial reasoning, and stochasticity. It benchmarks humans against state-of-the-art LLM-based agents across easy and medium modes, exposing substantial gaps in current AI capabilities. The results show systematic weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and robust performance under uncertainty, and reveal that sandbox exploration and oracle world models can mitigate some but not all of these gaps. Taken together, MAPs provides a concrete platform to drive progress toward robust, adaptable agents capable of real-world business decision-making.

Abstract

Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experience, planning over long horizons in stochastic settings, and reasoning over spatial information. Yet existing human--AI benchmarks isolate subsets of these capabilities, limiting our ability to assess holistic decision-making competence. We introduce Mini Amusement Parks (MAPs), an amusement-park simulator designed to evaluate an agent's ability to model its environment, anticipate long-term consequences under uncertainty, and strategically operate a complex business. We provide human baselines and a comprehensive evaluation of state-of-the-art LLM agents, finding that humans outperform these systems by 6.5x on easy mode and 9.8x on medium mode. Our analysis reveals persistent weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and world modelling. By unifying these challenges within a single environment, MAPs offers a new foundation for benchmarking agents capable of adaptable decision making. Code: https://github.com/Skyfall-Research/MAPs

Paper Structure

This paper contains 32 sections, 3 figures, 14 tables.

Figures (3)

  • Figure 1: The GUI view of Maps.
  • Figure 2: The per-day and full trajectory coefficients of variation for revenue, money, and park value across several full games.
  • Figure 3: Comparison of the park design with and without the added spatial heuristic overriding the LLMs placement choices.