Table of Contents
Fetching ...

Develop AI Agents for System Engineering in Factorio

Neel Kant

TL;DR

Static benchmarks fail to assess essential system-engineering skills such as long-horizon planning and adaptive trade-offs. The paper advocates training and evaluating AI agents in automation-oriented sandbox environments, using Factorio as a centerpiece to enable open-ended, dynamic tasks. It contributes a theoretical framing that combines the Law of Requisite Variety ($LRV$) and the Viable System Model (VSM), introduces an Agent-Evaluator paradigm, and defines evaluation criteria including $SPM$ for science throughput. The work argues that Factorio-based simulations can push AI toward robust, scalable system design with potential impact on large-scale engineering and infrastructure automation.

Abstract

Continuing advances in frontier model research are paving the way for widespread deployment of AI agents. Meanwhile, global interest in building large, complex systems in software, manufacturing, energy and logistics has never been greater. Although AI driven system engineering holds tremendous promise, the static benchmarks dominating agent evaluations today fail to capture the crucial skills required for implementing dynamic systems, such as managing uncertain trade-offs and ensuring proactive adaptability. This position paper advocates for training and evaluating AI agents' system engineering abilities through automation-oriented sandbox games-particularly Factorio. By directing research efforts in this direction, we can equip AI agents with the specialized reasoning and long-horizon planning necessary to design, maintain, and optimize tomorrow's most demanding engineering projects.

Develop AI Agents for System Engineering in Factorio

TL;DR

Static benchmarks fail to assess essential system-engineering skills such as long-horizon planning and adaptive trade-offs. The paper advocates training and evaluating AI agents in automation-oriented sandbox environments, using Factorio as a centerpiece to enable open-ended, dynamic tasks. It contributes a theoretical framing that combines the Law of Requisite Variety () and the Viable System Model (VSM), introduces an Agent-Evaluator paradigm, and defines evaluation criteria including for science throughput. The work argues that Factorio-based simulations can push AI toward robust, scalable system design with potential impact on large-scale engineering and infrastructure automation.

Abstract

Continuing advances in frontier model research are paving the way for widespread deployment of AI agents. Meanwhile, global interest in building large, complex systems in software, manufacturing, energy and logistics has never been greater. Although AI driven system engineering holds tremendous promise, the static benchmarks dominating agent evaluations today fail to capture the crucial skills required for implementing dynamic systems, such as managing uncertain trade-offs and ensuring proactive adaptability. This position paper advocates for training and evaluating AI agents' system engineering abilities through automation-oriented sandbox games-particularly Factorio. By directing research efforts in this direction, we can equip AI agents with the specialized reasoning and long-horizon planning necessary to design, maintain, and optimize tomorrow's most demanding engineering projects.

Paper Structure

This paper contains 26 sections, 17 figures, 1 table.

Figures (17)

  • Figure 1: The Law of Requisite Variety.$T_V: A \rightarrow E$ is a trajectory where a system stays viable through adaptation. $T_U: A \rightarrow D\prime$ shows an alternate trajectory where the system does not adapt and becomes unviable. A system is viable within the total state space $S$ when the variety of the environment at that time $V_E$ remains a subset of variety the system can handle $V_R$. Systems must adapt proactively ($A \rightarrow B$) to ensure this condition is met, but then ideally reduce variety to improve efficiency and maintainability ($D\rightarrow E$).
  • Figure 2: The Viable System Model. Systems are organized into five levels concisely given as: 1. operational units, 2. coordination, 3. present optimization, 4. future planning, and 5. ultimate policy.See Table \ref{['table:vsm_table']} for longer descriptions. These levels are only responsible for the variety associated with that level and can escalate or delegate as needed. A key aspect is how Level 5 effectively balances out the tension between Levels 3 and 4 which are more present- and future-focused respectively.
  • Figure 3: An example of early-game resource extraction and smelting in Factorio. Box A shows mining drills extracting iron ore, Box B highlights stone furnaces which take ore and fuel and create plates, and Box C highlights belt routing and inserter mechanics. ironMiningSmelting
  • Figure 4: Dependency graph for red and green science packs. Inputs include both raw materials and intermediates, reflecting the growing complexity of production chains.
  • Figure 5: Recipe for logistic (green) science packs. Automating intermediate goods significantly reduces total crafting time from 8.7 seconds (raw) to 6 seconds.
  • ...and 12 more figures