Table of Contents
Fetching ...

Learning in Multi-Objective Public Goods Games with Non-Linear Utilities

Nicole Orzan, Erman Acar, Davide Grossi, Patrick Mannion, Roxana Rădulescu

TL;DR

The paper investigates learning in a multi-objective public goods setting where agents have non-linear, risk-based utilities that decouple collective and individual incentives. It introduces MO-EPGG, a MO-MARL framework that vectorizes rewards into $(r^C,r^I)$ and applies a non-linear utility on the collective component under the SER criterion, enabling analysis of risk attitudes (via $\beta$) and environmental uncertainty. Through analytical game-theoretic results (ESR/SER and NE) and MO-DQN experiments, it shows that risk-averse (low $\beta$) agents hinder cooperation, while risk-seeking (high $\beta$) agents promote cooperation in competitive or mixed-motive settings, with uncertainty amplifying these effects; heterogeneity can suppress cooperation in cooperative regimes. The work provides a principled, scalable framework to study incentive alignment, risk preferences, and uncertainty in multi-agent learning, with implications for designing cooperative AI in uncertain human-agent collaborations.

Abstract

Addressing the question of how to achieve optimal decision-making under risk and uncertainty is crucial for enhancing the capabilities of artificial agents that collaborate with or support humans. In this work, we address this question in the context of Public Goods Games. We study learning in a novel multi-objective version of the Public Goods Game where agents have different risk preferences, by means of multi-objective reinforcement learning. We introduce a parametric non-linear utility function to model risk preferences at the level of individual agents, over the collective and individual reward components of the game. We study the interplay between such preference modelling and environmental uncertainty on the incentive alignment level in the game. We demonstrate how different combinations of individual preferences and environmental uncertainties sustain the emergence of cooperative patterns in non-cooperative environments (i.e., where competitive strategies are dominant), while others sustain competitive patterns in cooperative environments (i.e., where cooperative strategies are dominant).

Learning in Multi-Objective Public Goods Games with Non-Linear Utilities

TL;DR

The paper investigates learning in a multi-objective public goods setting where agents have non-linear, risk-based utilities that decouple collective and individual incentives. It introduces MO-EPGG, a MO-MARL framework that vectorizes rewards into and applies a non-linear utility on the collective component under the SER criterion, enabling analysis of risk attitudes (via ) and environmental uncertainty. Through analytical game-theoretic results (ESR/SER and NE) and MO-DQN experiments, it shows that risk-averse (low ) agents hinder cooperation, while risk-seeking (high ) agents promote cooperation in competitive or mixed-motive settings, with uncertainty amplifying these effects; heterogeneity can suppress cooperation in cooperative regimes. The work provides a principled, scalable framework to study incentive alignment, risk preferences, and uncertainty in multi-agent learning, with implications for designing cooperative AI in uncertain human-agent collaborations.

Abstract

Addressing the question of how to achieve optimal decision-making under risk and uncertainty is crucial for enhancing the capabilities of artificial agents that collaborate with or support humans. In this work, we address this question in the context of Public Goods Games. We study learning in a novel multi-objective version of the Public Goods Game where agents have different risk preferences, by means of multi-objective reinforcement learning. We introduce a parametric non-linear utility function to model risk preferences at the level of individual agents, over the collective and individual reward components of the game. We study the interplay between such preference modelling and environmental uncertainty on the incentive alignment level in the game. We demonstrate how different combinations of individual preferences and environmental uncertainties sustain the emergence of cooperative patterns in non-cooperative environments (i.e., where competitive strategies are dominant), while others sustain competitive patterns in cooperative environments (i.e., where cooperative strategies are dominant).
Paper Structure (23 sections, 8 equations, 7 figures, 2 tables)

This paper contains 23 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Multi-objective payoff matrices received by $N=2$ players with $4$ coins each, playing the MO-EPGG with multiplication factors of $0.5, 1.5$ and $2.5$, when taking the cooperative ($C$) or defective ($D$) actions.
  • Figure 2: The probability of action $C$ for Player $0$ under the NE of a 2-player MO-EPGG, for varying values of $f$ and $\beta$. The corresponding plot for Player $1$ is identical. We note that in the case in which two mixed-strategy NE are present for the same value of $\beta$, the joint strategies are formed by the two different points present. The strategies of the agents are identical for pure-strategy NE.
  • Figure 3: Price of Anarchy for varying values of $f$ and $\beta$.
  • Figure 4: Average cooperation values for the active DQN agents trained across environments with different multiplication factors, without (top row) and with uncertainty (bottom row) on the observed multiplication factor, with $\sigma_i = 2, \forall \; i \in N$. The different values of $\beta$ are identical for every agent $\beta_i = \beta, \forall \; i \in N$.
  • Figure 5: Average cooperation values for the active DQN agents trained across environments with different multiplication factors, without (top row) and with uncertainty (bottom row) on the observed multiplication factor, with $\sigma_i = 2 \; \forall \; i \in N$. The values of $\beta$ are randomly sampled from a normal distribution $\beta_i \sim \mathcal{N}(\mu_{\beta},\sigma_{\beta}^2)\; \forall \; i \in N$, with $\mu_{\beta} = 1$ and three different values of $\sigma_{\beta} = {0.5, 2, 3}$.
  • ...and 2 more figures