Empathy Modeling in Active Inference Agents for Perspective-Taking and Alignment
Albarracin Mahault, Mikeda Anna, Jimenez Rodriguez Alejandro, Namjoshi Sanjeev, Sakthivadivel Dalton, Pae Hongju, Shah Harshil, Wilson Philip
TL;DR
A computational framework for empathy in active inference agents is introduced, grounded in explicit perspective-taking via self-other model transformation, and it is shown that empathic perspective-taking induces robust cooperation without explicit communication or reward shaping.
Abstract
Artificial agents capable of understanding and aligning with others' intentions are essential for safe and socially robust artificial intelligence. We introduce a computational framework for empathy in active inference agents, grounded in explicit perspective-taking via self-other model transformation. We instantiate this framework in a multi-agent Iterated Prisoner's Dilemma and show that empathic perspective-taking induces robust cooperation without explicit communication or reward shaping. Cooperation emerges only when empathy is reciprocated, while asymmetric empathy leads to systematic exploitation. Beyond equilibrium outcomes, empathic agents exhibit synchronized behavior, rapid recovery from stochastic defections, and joint intentional dynamics resembling apology-forgiveness cycles. Near empathy symmetry, interactions display long transients and elevated variance, consistent with critical dynamics near regime boundaries. We further examine a learning-enabled variant in which agents infer opponent type via Bayesian updating. While opponent models converge rapidly, long-run cooperation remains primarily determined by the empathy parameter, indicating that cooperation is driven by empathic structure rather than learned reciprocity. Empathy functions as a structural prior over social interaction, shaping coordination stability, robustness, and temporal dynamics. The proposed framework highlights active inference as a principled foundation for socially aligned artificial agents that coordinate through internal simulation rather than behavioral mimicry.
