Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

Yuanzhe Geng; Erwu Liu; Wei Ni; Rui Wang; Yan Liu; Hao Xu; Chen Cai; Abbas Jamalipour

Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

Yuanzhe Geng, Erwu Liu, Wei Ni, Rui Wang, Yan Liu, Hao Xu, Chen Cai, Abbas Jamalipour

TL;DR

This work tackles distributed decision-making in a two-hop AF cooperative network with conflicting goals by formulating a Stackelberg game where a relay alliance leads and a cost-aware source follows. It proves equilibrium existence in ideal CSI scenarios and develops a DDPG-based MARL framework that enables the source and relay alliance to learn near-optimal policies under outdated CSI. The approach demonstrates close alignment with game-theoretic equilibria (within about 2.9%) and outperforms several learning baselines in time-varying environments, offering practical resilience to incomplete state information. Overall, the combination of non-cooperative game theory with MARL yields a scalable, distributed solution for balancing throughput and cost in cooperative communications.

Abstract

This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays form an alliance in an attempt to maximize the benefit of relaying while the source aims to increase the channel capacity cost-effectively. To this end, we establish the trade problem as a Stackelberg game, and prove the existence of its equilibrium. Another important aspect is that we use multi-agent reinforcement learning (MARL) to approach the equilibrium in a situation where the instantaneous channel state information (CSI) is unavailable, and the source and relays do not have knowledge of each other's goal. A multi-agent deep deterministic policy gradient-based framework is designed, where the relay alliance and the source act as agents. Experiments demonstrate that the proposed method can obtain an acceptable performance that is close to the game-theoretic equilibrium for all players under time-invariant environments, which considerably outperforms its potential alternatives and is only about 2.9% away from the optimal solution.

Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (17 sections, 41 equations, 7 figures, 1 algorithm)

This paper contains 17 sections, 41 equations, 7 figures, 1 algorithm.

Introduction
Related Work
System Model and Problem Formulation
Communication Model
Stackelberg Game Model
Game Analysis and Solution
Analysis of Relay Alliance
Analysis of Competitive Relays
MARL for Stackelberg Game
Markov Decision Process
DDPG Based MARL Solution
Upper Bound Analysis for MARL
Complexity Analysis
Numerical Evaluation
Experiment Setup
...and 2 more sections

Figures (7)

Figure 1: The illustration of a two-hop relay-enabled cooperative network.
Figure 2: Multi-agent RL interaction framework for the Stackelberg game.
Figure 3: Internal architecture of a single DDPG agent with prioritized experience buffer.
Figure 4: Performance comparison between the different relay settings under different levels of the transmit power at the source.
Figure 5: Performance comparison between the different methods in the training stage.
...and 2 more figures

Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

TL;DR

Abstract

Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)