Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement Learning
Oluwaseyi Giwa, Jonathan Shock, Jaco Du Toit, Tobi Awodumila
TL;DR
The paper tackles dynamic resource allocation in HetNets under varying loads and channel conditions by proposing a DRL framework that jointly optimizes transmit power, bandwidth, and scheduling using a multi-objective reward. It formulates the problem as an MDP and compares two state-of-the-art DRL methods, TD3 and PPO, against three heuristics in a realistic, satellite-derived topology with $r_t$ balancing throughput, power, and fairness. Results show DRL approaches outperform heuristic baselines, with TD3 offering faster initial convergence and PPO delivering higher long-term reward, highlighting a trade-off between sample efficiency and ultimate performance. The work demonstrates the viability of DRL for next-generation HetNets and points to future directions such as multi-agent coordination and mobility modeling to further enhance practicality.
Abstract
Dynamic resource allocation in heterogeneous wireless networks (HetNets) is challenging for traditional methods under varying user loads and channel conditions. We propose a deep reinforcement learning (DRL) framework that jointly optimises transmit power, bandwidth, and scheduling via a multi-objective reward balancing throughput, energy efficiency, and fairness. Using real base station coordinates, we compare Proximal Policy Optimisation (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3) against three heuristic algorithms in multiple network scenarios. Our results show that DRL frameworks outperform heuristic algorithms in optimising resource allocation in dynamic networks. These findings highlight key trade-offs in DRL design for future HetNets.
