Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou; Yasaman Omid; Sangarapillai Lambotharan; Mahsa Derakhshan; Lajos Hanzo

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou, Yasaman Omid, Sangarapillai Lambotharan, Mahsa Derakhshan, Lajos Hanzo

Abstract

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Abstract

Paper Structure (24 sections, 6 theorems, 54 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 54 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
System Model and Problem Formulation
Background and Preliminaries
Augmented Markov Decision Process for Delayed CSI
Proximal Policy Optimisation
Limitations of Existing MARL Approaches
Proposed DS-PPO Algorithm
Algorithm
State Space
Action Space
Reward function
First Stage
Second Stage
Convergence Analysis
Preliminaries
...and 9 more sections

Key Result

Proposition 1

Given the TPM $\mathbf{V}_l$, the projection of $\mathbf{V}_l$ into a set $\mathbb{S}$ is given by, Given that $\mathbb{S}$ is a sphere centred at the origin with radius $P$, such that ($\mathbb{S}=\{\mathbf{V}_l |\|\mathbf{V}_l\| \leq P\}),$ the projection of $\mathbf{V}_l$ is obtained by

Figures (6)

Figure 1: System model: a cluster of $L$ satellites providing service for $K$ users.
Figure 2: DS-PPO for enhanced TPM generation.
Figure 3: DS-PPO performance with $K\in\{2,4\}$ users in perfect CSI and delayed CSI , in presence of handovers. The delayed observations effects are neglectable in both cases.
Figure 4: Comparative results with $L\in\{4,6\}$ satellites and $K=6$ users in perfect CSI and delayed CSI scenarios.The increment in satellites clearly increases the sum rate by $20\%$
Figure 5: Comparative results with $L\in[4,8]$ satellites and $K=6$ users in delayed CSI scenarios. The highest sum rate is guaranteed with $L=6$
...and 1 more figures

Theorems & Definitions (8)

Proposition 1: Power Budget Projection
Definition 1: Policy Performance
Definition 2: Advantage Function
Theorem 1: Stage 2 Performance Improvement
Lemma 1: Bounded Value Function konda2004convergencelin2022convergence
Lemma 2: Performance Difference Lemma kakade2002approximately
Lemma 3: Discounted State Visitation
Lemma 4: Coupling Lemma levin2009markov

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Abstract

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)