Multi-Agent Reinforcement Learning for Task Offloading in Wireless Edge Networks
Andrea Fox, Francesco De Pellegrini, Eitan Altman
TL;DR
This paper tackles scalable, decentralized task offloading in wireless edge networks by formulating each device as an independent constrained MDP (CMDP) and coordinating all agents through infrequently updated shared constraints. The proposed Decentralized Coordination via CMDPs (DCC) framework uses a three-timescale learning scheme: fast local policy optimization under a decomposed, approximate reward, intermediate Lagrange multiplier updates to enforce long-term constraints, and slow, stochastic optimization of the constraint vector to align with global objectives. The authors provide a theoretical bound on the reward approximation, differentiability results, and gradient-simplification techniques, and validate the approach on toy edge-offloading scenarios where DCC-QL outperforms independent Q-learning and competitive CTDE baselines, especially as system size grows. The work demonstrates that lightweight, constraint-driven coordination can yield scalable, communication-efficient performance improvements in congestible wireless edge environments, with clear directions for extending to asynchronous updates and broader empirical validation.
Abstract
In edge computing systems, autonomous agents must make fast local decisions while competing for shared resources. Existing MARL methods often resume to centralized critics or frequent communication, which fail under limited observability and communication constraints. We propose a decentralized framework in which each agent solves a constrained Markov decision process (CMDP), coordinating implicitly through a shared constraint vector. For the specific case of offloading, e.g., constraints prevent overloading shared server resources. Coordination constraints are updated infrequently and act as a lightweight coordination mechanism. They enable agents to align with global resource usage objectives but require little direct communication. Using safe reinforcement learning, agents learn policies that meet both local and global goals. We establish theoretical guarantees under mild assumptions and validate our approach experimentally, showing improved performance over centralized and independent baselines, especially in large-scale settings.
