Empathy Modeling in Active Inference Agents for Perspective-Taking and Alignment

Albarracin Mahault; Mikeda Anna; Jimenez Rodriguez Alejandro; Namjoshi Sanjeev; Sakthivadivel Dalton; Pae Hongju; Shah Harshil; Wilson Philip

Empathy Modeling in Active Inference Agents for Perspective-Taking and Alignment

Albarracin Mahault, Mikeda Anna, Jimenez Rodriguez Alejandro, Namjoshi Sanjeev, Sakthivadivel Dalton, Pae Hongju, Shah Harshil, Wilson Philip

TL;DR

A computational framework for empathy in active inference agents is introduced, grounded in explicit perspective-taking via self-other model transformation, and it is shown that empathic perspective-taking induces robust cooperation without explicit communication or reward shaping.

Abstract

Artificial agents capable of understanding and aligning with others' intentions are essential for safe and socially robust artificial intelligence. We introduce a computational framework for empathy in active inference agents, grounded in explicit perspective-taking via self-other model transformation. We instantiate this framework in a multi-agent Iterated Prisoner's Dilemma and show that empathic perspective-taking induces robust cooperation without explicit communication or reward shaping. Cooperation emerges only when empathy is reciprocated, while asymmetric empathy leads to systematic exploitation. Beyond equilibrium outcomes, empathic agents exhibit synchronized behavior, rapid recovery from stochastic defections, and joint intentional dynamics resembling apology-forgiveness cycles. Near empathy symmetry, interactions display long transients and elevated variance, consistent with critical dynamics near regime boundaries. We further examine a learning-enabled variant in which agents infer opponent type via Bayesian updating. While opponent models converge rapidly, long-run cooperation remains primarily determined by the empathy parameter, indicating that cooperation is driven by empathic structure rather than learned reciprocity. Empathy functions as a structural prior over social interaction, shaping coordination stability, robustness, and temporal dynamics. The proposed framework highlights active inference as a principled foundation for socially aligned artificial agents that coordinate through internal simulation rather than behavioral mimicry.

Empathy Modeling in Active Inference Agents for Perspective-Taking and Alignment

TL;DR

Abstract

Paper Structure (21 sections, 28 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 28 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Methodology
Generative Model for an Empathic Agent
Hidden state factorization and observation modalities
Perspective-taking via structurally matched generative models
Active inference and sophisticated planning
Results
Iterated Prisoner's Dilemma setup and global cooperation landscape
Emergent exploitation dynamics
Implicit communication and recovery dynamics
Boundary-layer variability near the transition
Transition to cooperation
Learning improves belief accuracy but does not induce cooperation
Strategic sophistication amplifies the need for empathy
Phase transition-like dynamics in the system
...and 6 more sections

Figures (7)

Figure 1: Mutual cooperation landscape across dyadic empathy. Each cell shows the mean fraction of rounds ending in mutual cooperation $(C,C)$, averaged over repeated simulations for the corresponding empathy pair $(\lambda_i,\lambda_j)$. Axes indicate the empathy parameters of the two agents $\lambda_i$ and $\lambda_j$.
Figure 2: Empathy asymmetry induces systematic exploitation. Mean payoff gap as a function of empathy difference $\lambda_i-\lambda_j$, where positive values indicate that agent $i$ obtains higher average payoff. Error bars denote $\pm1$ standard deviation across runs aggregated within each rounded empathy-difference bin.
Figure 3: Temporal cooperation dynamics across empathy regimes. All panels summarize 100-round interactions across simulation seeds under a rolling statistics window of $w=5$. Shaded bands denote $\pm1$ standard deviation across seeds. A: Rolling mutual cooperation rate. High empathy ($\lambda=0.7$) rapidly converges to and stabilizes near-perfect cooperation, moderate empathy ($\lambda=0.4$) sustains cooperation with greater variability, whereas low empathy ($\lambda=0.1$) and strongly asymmetric empathy ($\lambda=0.9/0.1$) remain near persistent defection. B: Example single-seed trajectories highlighting recovery. Under high empathy, isolated defections are followed by rapid restoration of cooperation (an "apology-forgiveness" pattern), whereas low-empathy agents typically fail to recover once defection occurs. C: Rolling action agreement rate, defined as the fraction of rounds in which both agents select the same action (either $C,C$ or $D,D$). High-empathy dyads synchronize through coordinated cooperation, low-empathy dyads synchronize primarily via mutual defection, and asymmetric empathy yields persistent desynchronization. D: Cumulative mutual cooperation rate. High and moderate-empathy dyads converge quickly to sustained cooperative outcomes, while low-empathy and asymmetric interactions converge to trajectories that accumulate few cooperative outcomes.
Figure 4: Boundary-layer dynamics near empathy symmetry ($\lambda_j=0.5$).A: Rolling mutual cooperation rate (window $w=5$, mean $\pm 1$ SD across seeds) for varying $\lambda_i$. While mean cooperation remains high close to symmetry, variability increases sharply near the cooperation-exploitation threshold. This indicates reduced dynamical stability despite similar equilibrium levels of cooperation. B: Representative single-seed trajectories illustrating the underlying temporal structure. Near the transition, interactions show more frequent defections.
Figure 5: Transition to cooperation.Left: Empirical and analytical probability of mutual cooperation as a function of $\lambda$. The transition point $\lambda \approx 0.24$ is the point of maximal sensitivity of the distribution, which is greater than the point of maximal uncertainty about the cooperation outcome (maximal entropy of the sequences). The transition point is exactly $\lambda = 0.2$ in the deterministic limit. Right: Memory effects for an individual agent measures as conditional probability rates with respect to the agent's history. At the boundaries of the transition point, there are peak memory effects (second order conditioning) in which defection is stubborn ($\lambda \approx 0.1$), or the cooperation is fragile ($\lambda \approx 0.35$) for a single agent. This effect is due to the coupling with the other agent.
...and 2 more figures

Empathy Modeling in Active Inference Agents for Perspective-Taking and Alignment

TL;DR

Abstract

Empathy Modeling in Active Inference Agents for Perspective-Taking and Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (7)