From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent gridworld-based AI safety benchmarks
Roland Pihlakas
TL;DR
This work articulates a gap in AI safety benchmarks by introducing biologically and economically grounded, multi-objective multi-agent gridworld benchmarks. It presents a three-stage suite that enforces homeostasis, boundedness, diminishing returns, sustainability, and resource sharing, along with cooperation scoring, to probe safety vs. performance tradeoffs. The authors implement nine environments within an extendable gridworld framework compatible with major RL and planning tools, and provide baseline results using Random, rule-based, SB3, and LLM agents. The study demonstrates how multi-objective and cooperative dynamics reveal risks and behaviors not captured by single-objective benchmarks, offering a more robust platform for evaluating alignment in complex, real-world-like settings.
Abstract
Developing safe, aligned agentic AI systems requires comprehensive empirical testing, yet many existing benchmarks neglect crucial themes aligned with biology and economics, both time-tested fundamental sciences describing our needs and preferences. To address this gap, the present work focuses on introducing biologically and economically motivated themes that have been neglected in current mainstream discussions on AI safety - namely a set of multi-objective, multi-agent alignment benchmarks that emphasize homeostasis for bounded and biological objectives, diminishing returns for unbounded, instrumental, and business objectives, sustainability principle, and resource sharing. Eight main benchmark environments have been implemented on the above themes, to illustrate key pitfalls and challenges in agentic AI-s, such as unboundedly maximizing a homeostatic objective, over-optimizing one objective at the expense of others, neglecting safety constraints, or depleting shared resources.
