Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

Talha Bozkus; Urbashi Mitra

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

Talha Bozkus, Urbashi Mitra

TL;DR

Numerical results show that the proposed ensemble learning algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms.

Abstract

Optimizing large-scale wireless networks, including optimal resource management, power allocation, and throughput maximization, is inherently challenging due to their non-observable system dynamics and heterogeneous and complex nature. Herein, a novel ensemble Q-learning algorithm that addresses the performance and complexity challenges of the traditional Q-learning algorithm for optimizing wireless networks is presented. Ensemble learning with synthetic Markov Decision Processes is tailored to wireless networks via new models for approximating large state-space observable wireless networks. In particular, digital cousins are proposed as an extension of the traditional digital twin concept wherein multiple Q-learning algorithms on multiple synthetic Markovian environments are run in parallel and their outputs are fused into a single Q-function. Convergence analyses of key statistics and Q-functions and derivations of upper bounds on the estimation bias and variance are provided. Numerical results across a variety of real-world wireless networks show that the proposed algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms. It is also shown that theoretical results properly predict trends in the experimental results.

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

TL;DR

Abstract

Paper Structure (27 sections, 7 theorems, 27 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 7 theorems, 27 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
System Model and Methods
Infinite Horizon Discounted Cost MDP model
Model-Free Reinforcement Learning: Q-Learning
Extension of Digital Twins: Digital Cousins
Algorithm
Deterministic analysis
Probabilistic analysis
Numerical Results
Wireless Network Models
MISO network with interference channels
MISO energy harvesting network with multiple relays
MIMO network with interference channels
MIMO network with mobile transmitters
Average Policy Error (APE) Performance
...and 12 more sections

Key Result

Proposition 1

Let $\Delta_{t}^{it}(s,a)$ denote the $Q$-function update of the output of Algorithm Algorithm: ensemble_link_learning for $(s,a)$ from time $t$ to $t+1$ as: $\Delta_{t}^{it}(s,a) = \mathbf{Q}_{t+1}^{it}(s,a) - \mathbf{Q}_{t}^{it}(s,a)$. Then, for all $(s,a)$. (See Appendix Appendix: proposition_1)

Figures (9)

Figure 1: The relationship between $\mathbf{L}^{(1)}$ (= $\mathbf{\hat{P}}$), $\mathbf{L}^{(n)}$, $\mathcal{M}^{(1)}$ and $\mathcal{M}^{(n)}$
Figure 2: Comparison of Q-Learning (QL) algorithms based on their implementation strategies
Figure 3: Examples of wireless network models.
Figure 4: APE performances across different environments
Figure 5: APE of different algorithms across different models
...and 4 more figures

Theorems & Definitions (7)

Proposition 1
Proposition 2
Corollary 1
Proposition 3
Proposition 4
Proposition 5
Corollary 2

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

TL;DR

Abstract

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)