RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Siqi Shen; Chennan Ma; Chao Li; Weiquan Liu; Yongquan Fu; Songzhu Mei; Xinwang Liu; Cheng Wang

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Siqi Shen, Chennan Ma, Chao Li, Weiquan Liu, Yongquan Fu, Songzhu Mei, Xinwang Liu, Cheng Wang

TL;DR

RiskQ is proposed, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities and satisfies the RIGM principle for the VaR and distorted risk metrics.

Abstract

Multi-agent systems are characterized by environmental uncertainty, varying policies of agents, and partial observability, which result in significant risks. In the context of Multi-Agent Reinforcement Learning (MARL), learning coordinated and decentralized policies that are sensitive to risk is challenging. To formulate the coordination requirements in risk-sensitive MARL, we introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. This principle requires that the collection of risk-sensitive action selections of each agent should be equivalent to the risk-sensitive action selection of the central policy. Current MARL value factorization methods do not satisfy the RIGM principle for common risk metrics such as the Value at Risk (VaR) metric or distorted risk measurements. Therefore, we propose RiskQ to address this limitation, which models the joint return distribution by modeling quantiles of it as weighted quantile mixtures of per-agent return distribution utilities. RiskQ satisfies the RIGM principle for the VaR and distorted risk metrics. We show that RiskQ can obtain promising performance through extensive experiments. The source code of RiskQ is available in https://github.com/xmu-rl-3dv/RiskQ.

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

TL;DR

Abstract

Paper Structure (47 sections, 11 theorems, 43 equations, 24 figures, 1 table, 1 algorithm)

This paper contains 47 sections, 11 theorems, 43 equations, 24 figures, 1 table, 1 algorithm.

Introduction
Background
Dec-POMDPs
Value Function Factorization
Distributional RL and Risk
Related Work
Value Factorization
Risk-sensitive RL
Risk-sensitive Value Factorization
Risk-sensitive Individual-Global-Max (RIGM) Principle
RiskQ
Neural Networks and Loss
Evaluation
Experimental Setup
Multi-Agent Cliff Navigation
...and 32 more sections

Key Result

Theorem 1

Given a deterministic joint state-action value function $Q_{jt}$, a joint state-action return distribution $Z_{jt}$, and a factorization function $\Phi$ for deterministic utilities: such that $[Q_i]_{i=1}^N$ satisfy IGM for $Q_{jt}$ under $\tau$, the following risk-sensitive distributional factorization: is insufficient to guarantee that $[Z_i]_{i=1}^N$ satisfy RIGM for $Z_{jt}(\tau, u)$ with ri

Figures (24)

Figure 1: RiskQ overview: (a) quantiles mixing for $Z_{jt}$, (b) mixer function, (c) agent return distribution utility
Figure 1: RiskQ framework
Figure 2: The return of the MACN (a-b) and the MACF (c) environment; the test win rate of random (d), explorative (e) and dilemmatic (f) 3s_vs_5z scenario of SMAC.
Figure 2: Multi-Agent cliff Navigation
Figure 3: Win Rate of the StarCraft Multi-Agent Scenarios.
...and 19 more figures

Theorems & Definitions (29)

Definition 1: IGM
Definition 2: DIGM
Definition 3: Value at Risk (VaR)
Definition 4: Distortion risk measures (DRM)
Definition 5: Conditional Value at Risk(CVaR)
Definition 6: RIGM
Theorem 1
Theorem 2
Theorem 3
Theorem 4
...and 19 more

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

TL;DR

Abstract

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (29)