Learning Non-myopic Power Allocation in Constrained Scenarios
Arindam Chowdhury, Santiago Paternain, Gunjan Verma, Ananthram Swami, Santiago Segarra
TL;DR
This work tackles non-myopic power allocation in ad hoc interference networks under episodic battery constraints by formulating the problem as a constrained sequential decision-making task. It introduces a two-level hierarchy where a fast lower-level solver provides an instantaneous power plan and a GCNN-powered upper-level policy adjusts this plan using the current battery state to satisfy the episode-wide constraint, trained via constrained TD3. The lower level relies on UWMMSE for solid instantaneous performance, while the upper level learns to allocate power non-greedily to maximize the episodic sum-rate, balancing short-term gains with long-term feasibility. Experimental results show a 15–20% improvement in episodic utility over myopic strategies with minimal constraint violations and fast inference, demonstrating practical viability and generalization across varying episode lengths. The approach leverages graph structure through GCNNs to enable scalable, distributed deployment and sets the stage for extensions to multiple time-coupled constraints and mobile networks.
Abstract
We propose a learning-based framework for efficient power allocation in ad hoc interference networks under episodic constraints. The problem of optimal power allocation -- for maximizing a given network utility metric -- under instantaneous constraints has recently gained significant popularity. Several learnable algorithms have been proposed to obtain fast, effective, and near-optimal performance. However, a more realistic scenario arises when the utility metric has to be optimized for an entire episode under time-coupled constraints. In this case, the instantaneous power needs to be regulated so that the given utility can be optimized over an entire sequence of wireless network realizations while satisfying the constraint at all times. Solving each instance independently will be myopic as the long-term constraint cannot modulate such a solution. Instead, we frame this as a constrained and sequential decision-making problem, and employ an actor-critic algorithm to obtain the constraint-aware power allocation at each step. We present experimental analyses to illustrate the effectiveness of our method in terms of superior episodic network-utility performance and its efficiency in terms of time and computational complexity.
