Develop AI Agents for System Engineering in Factorio
Neel Kant
TL;DR
Static benchmarks fail to assess essential system-engineering skills such as long-horizon planning and adaptive trade-offs. The paper advocates training and evaluating AI agents in automation-oriented sandbox environments, using Factorio as a centerpiece to enable open-ended, dynamic tasks. It contributes a theoretical framing that combines the Law of Requisite Variety ($LRV$) and the Viable System Model (VSM), introduces an Agent-Evaluator paradigm, and defines evaluation criteria including $SPM$ for science throughput. The work argues that Factorio-based simulations can push AI toward robust, scalable system design with potential impact on large-scale engineering and infrastructure automation.
Abstract
Continuing advances in frontier model research are paving the way for widespread deployment of AI agents. Meanwhile, global interest in building large, complex systems in software, manufacturing, energy and logistics has never been greater. Although AI driven system engineering holds tremendous promise, the static benchmarks dominating agent evaluations today fail to capture the crucial skills required for implementing dynamic systems, such as managing uncertain trade-offs and ensuring proactive adaptability. This position paper advocates for training and evaluating AI agents' system engineering abilities through automation-oriented sandbox games-particularly Factorio. By directing research efforts in this direction, we can equip AI agents with the specialized reasoning and long-horizon planning necessary to design, maintain, and optimize tomorrow's most demanding engineering projects.
