Table of Contents
Fetching ...

Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

TL;DR

This work addresses how moral preferences co-evolve in populations of learning agents within social dilemmas. It leverages an Iterated Prisoner's Dilemma with partner selection and dual Q-learning networks to model agents that optimize intrinsic rewards reflecting consequentialist, norm-based, and virtue-based ethics. The study reveals that certain pro-social types can steer selfish learners toward cooperation, while norm-driven configurations can produce self-sabotaging dynamics and exploitable interactions, depending on population composition and selection pressures. These findings have implications for AI safety and alignment, showing how moral heterogeneity can shape learning trajectories and societal outcomes in engineered multi-agent systems. The results also establish a general methodology for analyzing emergent behavior in heterogeneous moral populations and point to future work on richer moral frameworks and partial observability.

Abstract

Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.

Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents

TL;DR

This work addresses how moral preferences co-evolve in populations of learning agents within social dilemmas. It leverages an Iterated Prisoner's Dilemma with partner selection and dual Q-learning networks to model agents that optimize intrinsic rewards reflecting consequentialist, norm-based, and virtue-based ethics. The study reveals that certain pro-social types can steer selfish learners toward cooperation, while norm-driven configurations can produce self-sabotaging dynamics and exploitable interactions, depending on population composition and selection pressures. These findings have implications for AI safety and alignment, showing how moral heterogeneity can shape learning trajectories and societal outcomes in engineered multi-agent systems. The results also establish a general methodology for analyzing emergent behavior in heterogeneous moral populations and point to future work on richer moral frameworks and partial observability.

Abstract

Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.
Paper Structure (26 sections, 3 equations, 12 figures, 3 tables)

This paper contains 26 sections, 3 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: (a) Cooperation by all player types within each population over time. (b) Cooperation by the Selfish player(s) in every population over time. In these charts, we plot the moving average of the mean across 20 runs.
  • Figure 2: Population-level social outcomes over time: Collective Reward, Gini Reward and Min Reward. We plot the moving average of the mean across 20 runs.
  • Figure 3: Popularity of player types in each population on the final 100 episodes. Values represent the average across 20 runs and the associated confidence intervals. We sum the values for cases where more than one player of the same type is present (e.g. 8xS players in the majority-S population). For ease of interpretation, we add a 50% reference line - this allows us to compare whether the majority player is selected more (or less) often than expected simply due to their prevalence in each population.
  • Figure 4: Selections made by individual players, with number of selections summed over all 30000 episodes (average across 20 runs), for two example populations: majority-S(a) & majority-De(b); for all populations, see Appendix, Figures 8a, 8b, 9a, 9b.
  • Figure 5: Game reward and intrinsic reward accumulated by each player type in each population over the entire 30000 episodes (averaged across 20 runs). We average per number of players of a certain type (e.g. 8xS players in the majority-S population). For comparability, we normalize intrinsic rewards using the minimum and maximum observed value for each player type.
  • ...and 7 more figures