Table of Contents
Fetching ...

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Siqi Shen, Chennan Ma, Chao Li, Weiquan Liu, Yongquan Fu, Songzhu Mei, Xinwang Liu, Cheng Wang

TL;DR

RiskQ is proposed, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities and satisfies the RIGM principle for the VaR and distorted risk metrics.

Abstract

Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https://github.com/xmu-rl-3dv/RiskQ.

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

TL;DR

RiskQ is proposed, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities and satisfies the RIGM principle for the VaR and distorted risk metrics.

Abstract

Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https://github.com/xmu-rl-3dv/RiskQ.
Paper Structure (47 sections, 11 theorems, 43 equations, 24 figures, 1 table, 1 algorithm)

This paper contains 47 sections, 11 theorems, 43 equations, 24 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Given a deterministic joint state-action value function $Q_{jt}$, a joint state-action return distribution $Z_{jt}$, and a factorization function $\Phi$ for deterministic utilities: such that $[Q_i]_{i=1}^N$ satisfy IGM for $Q_{jt}$ under $\tau$, the following risk-sensitive distributional factorization: is insufficient to guarantee that $[Z_i]_{i=1}^N$ satisfy RIGM for $Z_{jt}(\tau, u)$ with ri

Figures (24)

  • Figure 1: RiskQ overview: (a) quantiles mixing for $Z_{jt}$, (b) mixer function, (c) agent return distribution utility
  • Figure 1: RiskQ framework
  • Figure 2: The return of the MACN (a-b) and the MACF (c) environment; the test win rate of random (d), explorative (e) and dilemmatic (f) 3s_vs_5z scenario of SMAC.
  • Figure 2: Multi-Agent cliff Navigation
  • Figure 3: Win Rate of the StarCraft Multi-Agent Scenarios.
  • ...and 19 more figures

Theorems & Definitions (29)

  • Definition 1: IGM
  • Definition 2: DIGM
  • Definition 3: Value at Risk (VaR)
  • Definition 4: Distortion risk measures (DRM)
  • Definition 5: Conditional Value at Risk(CVaR)
  • Definition 6: RIGM
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • ...and 19 more