Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou, Yasaman Omid, Sangarapillai Lambotharan, Mahsa Derakhshan, Lajos Hanzo

Abstract

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Abstract

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.
Paper Structure (24 sections, 6 theorems, 54 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 54 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Given the TPM $\mathbf{V}_l$, the projection of $\mathbf{V}_l$ into a set $\mathbb{S}$ is given by, Given that $\mathbb{S}$ is a sphere centred at the origin with radius $P$, such that ($\mathbb{S}=\{\mathbf{V}_l |\|\mathbf{V}_l\| \leq P\}),$ the projection of $\mathbf{V}_l$ is obtained by

Figures (6)

  • Figure 1: System model: a cluster of $L$ satellites providing service for $K$ users.
  • Figure 2: DS-PPO for enhanced TPM generation.
  • Figure 3: DS-PPO performance with $K\in\{2,4\}$ users in perfect CSI and delayed CSI , in presence of handovers. The delayed observations effects are neglectable in both cases.
  • Figure 4: Comparative results with $L\in\{4,6\}$ satellites and $K=6$ users in perfect CSI and delayed CSI scenarios.The increment in satellites clearly increases the sum rate by $20\%$
  • Figure 5: Comparative results with $L\in[4,8]$ satellites and $K=6$ users in delayed CSI scenarios. The highest sum rate is guaranteed with $L=6$
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 1: Power Budget Projection
  • Definition 1: Policy Performance
  • Definition 2: Advantage Function
  • Theorem 1: Stage 2 Performance Improvement
  • Lemma 1: Bounded Value Function konda2004convergencelin2022convergence
  • Lemma 2: Performance Difference Lemma kakade2002approximately
  • Lemma 3: Discounted State Visitation
  • Lemma 4: Coupling Lemma levin2009markov