Table of Contents
Fetching ...

Free Energy Risk Metrics for Systemically Safe AI: Gatekeeping Multi-Agent Study

Michael Walters, Rafael Kaufmann, Justice Sefas, Thomas Kopinski

TL;DR

This paper harnesses the Free Energy Principle (FEP) and Active Inference to define a flexible, principled risk metric for agentic and multi-agent systems, introducing Cumulative Risk Exposure (CRE) as a context-aware measure that integrates extrinsic outcomes and epistemic uncertainty. By embedding stakeholder goals through a Boltzmann-form preference prior and a tunable temperature parameter, the framework supports gatekeepers that evaluate and intervene on policies to minimize accumulated risk. A key contribution is the adaptation of variational free energy objectives to future horizons via $EFE$ and $FEF$, and their observation-space decompositions for practical computation. The authors validate the approach in a simplified autonomous-vehicle setting, showing that gatekeepers leveraging CRE can meaningfully improve safety with partial adoption, highlighting the method's potential for scalable, transparent risk governance in complex AI systems.

Abstract

We investigate the Free Energy Principle as a foundation for measuring risk in agentic and multi-agent systems. From these principles we introduce a Cumulative Risk Exposure metric that is flexible to differing contexts and needs. We contrast this to other popular theories for safe AI that hinge on massive amounts of data or describing arbitrarily complex world models. In our framework, stakeholders need only specify their preferences over system outcomes, providing straightforward and transparent decision rules for risk governance and mitigation. This framework naturally accounts for uncertainty in both world model and preference model, allowing for decision-making that is epistemically and axiologically humble, parsimonious, and future-proof. We demonstrate this novel approach in a simplified autonomous vehicle environment with multi-agent vehicles whose driving policies are mediated by gatekeepers that evaluate, in an online fashion, the risk to the collective safety in their neighborhood, and intervene through each vehicle's policy when appropriate. We show that the introduction of gatekeepers in an AV fleet, even at low penetration, can generate significant positive externalities in terms of increased system safety.

Free Energy Risk Metrics for Systemically Safe AI: Gatekeeping Multi-Agent Study

TL;DR

This paper harnesses the Free Energy Principle (FEP) and Active Inference to define a flexible, principled risk metric for agentic and multi-agent systems, introducing Cumulative Risk Exposure (CRE) as a context-aware measure that integrates extrinsic outcomes and epistemic uncertainty. By embedding stakeholder goals through a Boltzmann-form preference prior and a tunable temperature parameter, the framework supports gatekeepers that evaluate and intervene on policies to minimize accumulated risk. A key contribution is the adaptation of variational free energy objectives to future horizons via and , and their observation-space decompositions for practical computation. The authors validate the approach in a simplified autonomous-vehicle setting, showing that gatekeepers leveraging CRE can meaningfully improve safety with partial adoption, highlighting the method's potential for scalable, transparent risk governance in complex AI systems.

Abstract

We investigate the Free Energy Principle as a foundation for measuring risk in agentic and multi-agent systems. From these principles we introduce a Cumulative Risk Exposure metric that is flexible to differing contexts and needs. We contrast this to other popular theories for safe AI that hinge on massive amounts of data or describing arbitrarily complex world models. In our framework, stakeholders need only specify their preferences over system outcomes, providing straightforward and transparent decision rules for risk governance and mitigation. This framework naturally accounts for uncertainty in both world model and preference model, allowing for decision-making that is epistemically and axiologically humble, parsimonious, and future-proof. We demonstrate this novel approach in a simplified autonomous vehicle environment with multi-agent vehicles whose driving policies are mediated by gatekeepers that evaluate, in an online fashion, the risk to the collective safety in their neighborhood, and intervene through each vehicle's policy when appropriate. We show that the introduction of gatekeepers in an AV fleet, even at low penetration, can generate significant positive externalities in terms of increased system safety.

Paper Structure

This paper contains 13 sections, 23 equations, 1 figure.

Figures (1)

  • Figure 1: Baseline and gatekeeper results. Gatekeeper runs had either 4/12 or 12/12 ego vehicles online. $R_D$, $R_S$, Loss, Crashed, and Fraction Defensive are averaged realized values. Each $E$[Energy] and Risk measurement is across $N_{MC}$ MC trajectories. The Fraction Defensive is the proportion of ego vehicles in the Defensive policy. Crashed is a cumulation of how many worlds have had an ego crash at or before a given step. Values are averaged across 1200 world draws, 90% CI displayed.