Table of Contents
Fetching ...

Explainable Multi-Agent Reinforcement Learning for Extended Reality Codec Adaptation

Pedro Enrique Iturria-Rivera, Raimundas Gaigalas, Medhat Elsayed, Majid Bavand, Yigit Ozcan, Melike Erol-Kantarci

TL;DR

This work introduces Value Function Factorization (VFF)-based Explainable (X) Multi-Agent Reinforcement Learning (MARL) algorithms, explaining reward design in XR codec adaptation through reward decomposition, and proposes adaptive XMARL, leveraging network gradients and reward decomposition for improved action selection.

Abstract

Extended Reality (XR) services are set to transform applications over 5th and 6th generation wireless networks, delivering immersive experiences. Concurrently, Artificial Intelligence (AI) advancements have expanded their role in wireless networks, however, trust and transparency in AI remain to be strengthened. Thus, providing explanations for AI-enabled systems can enhance trust. We introduce Value Function Factorization (VFF)-based Explainable (X) Multi-Agent Reinforcement Learning (MARL) algorithms, explaining reward design in XR codec adaptation through reward decomposition. We contribute four enhancements to XMARL algorithms. Firstly, we detail architectural modifications to enable reward decomposition in VFF-based MARL algorithms: Value Decomposition Networks (VDN), Mixture of Q-Values (QMIX), and Q-Transformation (Q-TRAN). Secondly, inspired by multi-task learning, we reduce the overhead of vanilla XMARL algorithms. Thirdly, we propose a new explainability metric, Reward Difference Fluctuation Explanation (RDFX), suitable for problems with adjustable parameters. Lastly, we propose adaptive XMARL, leveraging network gradients and reward decomposition for improved action selection. Simulation results indicate that, in XR codec adaptation, the Packet Delivery Ratio reward is the primary contributor to optimal performance compared to the initial composite reward, which included delay and Data Rate Ratio components. Modifications to VFF-based XMARL algorithms, incorporating multi-headed structures and adaptive loss functions, enable the best-performing algorithm, Multi-Headed Adaptive (MHA)-QMIX, to achieve significant average gains over the Adjust Packet Size baseline up to 10.7%, 41.4%, 33.3%, and 67.9% in XR index, jitter, delay, and Packet Loss Ratio (PLR), respectively.

Explainable Multi-Agent Reinforcement Learning for Extended Reality Codec Adaptation

TL;DR

This work introduces Value Function Factorization (VFF)-based Explainable (X) Multi-Agent Reinforcement Learning (MARL) algorithms, explaining reward design in XR codec adaptation through reward decomposition, and proposes adaptive XMARL, leveraging network gradients and reward decomposition for improved action selection.

Abstract

Extended Reality (XR) services are set to transform applications over 5th and 6th generation wireless networks, delivering immersive experiences. Concurrently, Artificial Intelligence (AI) advancements have expanded their role in wireless networks, however, trust and transparency in AI remain to be strengthened. Thus, providing explanations for AI-enabled systems can enhance trust. We introduce Value Function Factorization (VFF)-based Explainable (X) Multi-Agent Reinforcement Learning (MARL) algorithms, explaining reward design in XR codec adaptation through reward decomposition. We contribute four enhancements to XMARL algorithms. Firstly, we detail architectural modifications to enable reward decomposition in VFF-based MARL algorithms: Value Decomposition Networks (VDN), Mixture of Q-Values (QMIX), and Q-Transformation (Q-TRAN). Secondly, inspired by multi-task learning, we reduce the overhead of vanilla XMARL algorithms. Thirdly, we propose a new explainability metric, Reward Difference Fluctuation Explanation (RDFX), suitable for problems with adjustable parameters. Lastly, we propose adaptive XMARL, leveraging network gradients and reward decomposition for improved action selection. Simulation results indicate that, in XR codec adaptation, the Packet Delivery Ratio reward is the primary contributor to optimal performance compared to the initial composite reward, which included delay and Data Rate Ratio components. Modifications to VFF-based XMARL algorithms, incorporating multi-headed structures and adaptive loss functions, enable the best-performing algorithm, Multi-Headed Adaptive (MHA)-QMIX, to achieve significant average gains over the Adjust Packet Size baseline up to 10.7%, 41.4%, 33.3%, and 67.9% in XR index, jitter, delay, and Packet Loss Ratio (PLR), respectively.

Paper Structure

This paper contains 37 sections, 1 theorem, 40 equations, 14 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

Let a multi-agent Markov Game $\mathcal{M}\mathcal{G}(\mathcal{S}, \mathcal{A}, P, r, N, \gamma, d_0)$ with an individual agent decomposed objective function $r(s,a)$ be, where $\sigma_{c}\in [0,1], \sum_{c\in C}\sigma_c = 1$ corresponds to the weight for the $c^{th}$ decomposed reward and $r_{c}(s,a)\rightarrow \mathbb{R}$ the $c^{th}$ reward component for the state-action pair $s$ and $a$. Let

Figures (14)

  • Figure 1: High-level taxonomy of proposed algorithms. The red colored-dashed boxes depict the area of reinforcement learning out of the scope of this work. At the bottom of the figure, the three explainable MARL families that are proposed as part of this paper.
  • Figure 2: Internal structure of the adapted explainable MARL algorithms: $\bm{(1)}$ Overview of the Decomposed Value Decomposition Networks (DVDN): $\bm{(a)}$$C$ Individual action-value networks $\bm{(b)}$ Details of the individual action-value network $\bm{(c)}$$C$ additive summation of Q-values. $\bm{(2)}$ Overview of the Decomposed QMIX (DQMIX, Mixture of Q-Values): $\bm{(a)}$$C$ Individual action-value networks $\bm{(b)}$ Details of the individual action-value network $\bm{(c)}$$C$ mixing networks.
  • Figure 3: Internal structure of the Multi-Headed Explainable MARL algorithms: $\bm{(1)}$ Overview of the Multi-Headed Decomposed Value Decomposition Networks (MH-DVDN): $\bm{(a)}$ Individual action-value network with shared layers $\bm{(b)}$ Details of the individual action-value network $\bm{(c)}$ Additive summation of Q-values. $\bm{(2)}$ Overview of the Multi-Headed Decomposed QMIX (MH-DQMIX, Mixture of Q-Values): $\bm{(a)}$ Individual action-value network with shared layers $\bm{(b)}$ Details of the individual action-value network $\bm{(c)}$ Mixing network.
  • Figure 4: Internal structure and loss calculation of the Multi-Headed Decomposed QTRAN (MH-DQTRAN): $\bm{(a)}$ Individual action-value network with shared layers $\bm{(b)}$ Details of the individual action-value network $\bm{(c)}$ Joint action-value network $\bm{(d)}$ Additive summation of Q-values $\bm{(e)}$ State-value network.
  • Figure 5: Summary of the importance gradient mechanism. $\bm{(a)}$ Multi-Headed loss function without gradient normalization $\bm{(b)}$ Multi-Headed loss function with gradient normalization $\bm{(c)}$ Multi-headed loss function with importance gradient.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Theorem 1