Table of Contents
Fetching ...

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

Talha Bozkus, Urbashi Mitra

TL;DR

Numerical results show that the proposed ensemble learning algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms.

Abstract

Optimizing large-scale wireless networks, including optimal resource management, power allocation, and throughput maximization, is inherently challenging due to their non-observable system dynamics and heterogeneous and complex nature. Herein, a novel ensemble Q-learning algorithm that addresses the performance and complexity challenges of the traditional Q-learning algorithm for optimizing wireless networks is presented. Ensemble learning with synthetic Markov Decision Processes is tailored to wireless networks via new models for approximating large state-space observable wireless networks. In particular, digital cousins are proposed as an extension of the traditional digital twin concept wherein multiple Q-learning algorithms on multiple synthetic Markovian environments are run in parallel and their outputs are fused into a single Q-function. Convergence analyses of key statistics and Q-functions and derivations of upper bounds on the estimation bias and variance are provided. Numerical results across a variety of real-world wireless networks show that the proposed algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms. It is also shown that theoretical results properly predict trends in the experimental results.

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

TL;DR

Numerical results show that the proposed ensemble learning algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms.

Abstract

Optimizing large-scale wireless networks, including optimal resource management, power allocation, and throughput maximization, is inherently challenging due to their non-observable system dynamics and heterogeneous and complex nature. Herein, a novel ensemble Q-learning algorithm that addresses the performance and complexity challenges of the traditional Q-learning algorithm for optimizing wireless networks is presented. Ensemble learning with synthetic Markov Decision Processes is tailored to wireless networks via new models for approximating large state-space observable wireless networks. In particular, digital cousins are proposed as an extension of the traditional digital twin concept wherein multiple Q-learning algorithms on multiple synthetic Markovian environments are run in parallel and their outputs are fused into a single Q-function. Convergence analyses of key statistics and Q-functions and derivations of upper bounds on the estimation bias and variance are provided. Numerical results across a variety of real-world wireless networks show that the proposed algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms. It is also shown that theoretical results properly predict trends in the experimental results.
Paper Structure (27 sections, 7 theorems, 27 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 7 theorems, 27 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Let $\Delta_{t}^{it}(s,a)$ denote the $Q$-function update of the output of Algorithm Algorithm: ensemble_link_learning for $(s,a)$ from time $t$ to $t+1$ as: $\Delta_{t}^{it}(s,a) = \mathbf{Q}_{t+1}^{it}(s,a) - \mathbf{Q}_{t}^{it}(s,a)$. Then, for all $(s,a)$. (See Appendix Appendix: proposition_1)

Figures (9)

  • Figure 1: The relationship between $\mathbf{L}^{(1)}$ (= $\mathbf{\hat{P}}$), $\mathbf{L}^{(n)}$, $\mathcal{M}^{(1)}$ and $\mathcal{M}^{(n)}$
  • Figure 2: Comparison of Q-Learning (QL) algorithms based on their implementation strategies
  • Figure 3: Examples of wireless network models.
  • Figure 4: APE performances across different environments
  • Figure 5: APE of different algorithms across different models
  • ...and 4 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Proposition 2
  • Corollary 1
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Corollary 2