Table of Contents
Fetching ...

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Nikunj Gupta, Somjit Nath, Samira Ebrahimi Kahou

TL;DR

The paper tackles the challenge of coordinating multiple autonomous agents when others’ behaviors are uncertain. It introduces CAMMARL, which models other agents’ actions as conformal prediction sets with explicit coverage guarantees and feeds these sets into the self-agent’s policy to guide learning. Across cooperative tasks, CAMMARL achieves performance close to an upper-bound GIAM, beating baselines that lack uncertainty handling or access to true actions/observations. The approach provides principled uncertainty quantification via conformal predictions, improving robustness and sample efficiency in multi-agent coordination with practical implications for real-world cooperative AI.

Abstract

Before taking actions in an environment with more than one intelligent agent, an autonomous agent may benefit from reasoning about the other agents and utilizing a notion of a guarantee or confidence about the behavior of the system. In this article, we propose a novel multi-agent reinforcement learning (MARL) algorithm CAMMARL, which involves modeling the actions of other agents in different situations in the form of confident sets, i.e., sets containing their true actions with a high probability. We then use these estimates to inform an agent's decision-making. For estimating such sets, we use the concept of conformal predictions, by means of which, we not only obtain an estimate of the most probable outcome but get to quantify the operable uncertainty as well. For instance, we can predict a set that provably covers the true predictions with high probabilities (e.g., 95%). Through several experiments in two fully cooperative multi-agent tasks, we show that CAMMARL elevates the capabilities of an autonomous agent in MARL by modeling conformal prediction sets over the behavior of other agents in the environment and utilizing such estimates to enhance its policy learning.

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

TL;DR

The paper tackles the challenge of coordinating multiple autonomous agents when others’ behaviors are uncertain. It introduces CAMMARL, which models other agents’ actions as conformal prediction sets with explicit coverage guarantees and feeds these sets into the self-agent’s policy to guide learning. Across cooperative tasks, CAMMARL achieves performance close to an upper-bound GIAM, beating baselines that lack uncertainty handling or access to true actions/observations. The approach provides principled uncertainty quantification via conformal predictions, improving robustness and sample efficiency in multi-agent coordination with practical implications for real-world cooperative AI.

Abstract

Before taking actions in an environment with more than one intelligent agent, an autonomous agent may benefit from reasoning about the other agents and utilizing a notion of a guarantee or confidence about the behavior of the system. In this article, we propose a novel multi-agent reinforcement learning (MARL) algorithm CAMMARL, which involves modeling the actions of other agents in different situations in the form of confident sets, i.e., sets containing their true actions with a high probability. We then use these estimates to inform an agent's decision-making. For estimating such sets, we use the concept of conformal predictions, by means of which, we not only obtain an estimate of the most probable outcome but get to quantify the operable uncertainty as well. For instance, we can predict a set that provably covers the true predictions with high probabilities (e.g., 95%). Through several experiments in two fully cooperative multi-agent tasks, we show that CAMMARL elevates the capabilities of an autonomous agent in MARL by modeling conformal prediction sets over the behavior of other agents in the environment and utilizing such estimates to enhance its policy learning.
Paper Structure (40 sections, 6 equations, 10 figures, 1 table)

This paper contains 40 sections, 6 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Our proposed methodology of informing an autonomous agent's decision-making by means of conformal predictions of action sets of other agents in the environment illustrated with two agents for simplicity. Two agents ($\mathcal{N}_{self}$, $\mathcal{N}_{other}$) receive their own partial observations from the environment ($o_{self}$, $o_{other}$) and take their actions ($a_{self}$, $a_{other}$). An independent conformal action prediction model $\mathcal{C}$ learns to output a conformal action set, $\{a'_{other}\}$, corresponding to $\mathcal{N}_{other}$ which are then used as additional inputs for training by $\mathcal{N}_{self}$ to inform its policy and perform its action $a_{self}$.
  • Figure 2: A detailed illustration of conformal action modelling and inference in cammarl to generate prediction sets of $\mathcal{N}_{other}$'s actions using conformal predictors.
  • Figure 3: Comparison of agent performances (in terms of reward accumulation) in cn (a) and lbf (b) in different settings with varying pieces of information available to $\mathcal{N}_{self}$ during training. cammarl's performance is very close to the upper bound, GIAM, and is considerably better than the other extreme, NOAM. It also outperforms the other defined benchmarks (TAAM, TOAM, & EAP) in both tasks, along with the benefit of uncertainty quantification of its estimates. Interestingly, in CN, cammarl can be seen to learn arguably faster, but all methods converge to similar results, whereas in LBF, it actually seems to converge to a better policy. The curves are averaged over five independent trials and smoothed using a moving window average (100 points) for readability.
  • Figure 4: Comparison of agent performances (in terms of reward accumulation) in environments with more than 2 agents: (a) Pressure Plate and (b) Google Football. Interestingly, Pressure Plate cammarl can be seen to learn arguably faster, but all methods converge to similar results, whereas in Google Football, cammarl reaches a higher reward than the baselines. In Google Football, the observations are global, so we did not include TOAM and GIAM. The curves are averaged over five independent runs.
  • Figure 5: Comparison of agent performances (in terms of reward accumulation) in LBF with 2 agents (2p) and 6 foods (6f) respectively with cooperation turned off. cammarl's performance is very close to the upper bound, GIAM, and is considerably better than the other extreme, NOAM. Interestingly, cammarl also seems to converge faster than the other baselines. The curves are averaged over five seeds.
  • ...and 5 more figures