Play to Earn in the Metaverse with Mobile Edge Computing over Wireless Networks: A Deep Reinforcement Learning Approach
Terence Jie Chua, Wenhan Yu, Jun Zhao
TL;DR
The paper tackles the joint optimization of downlink latency, uplink latency, and worst-case UE battery expenditure, while maximizing worst-case earning potential in play-to-earn MAR games over mobile edge computing. It introduces Multi-Agent Loss-Sharing (MALS), an asymmetric, asynchronous reinforcement learning framework built on PPO, with discrete DL UE-MBS allocation and continuous UL power control, using a two-head critic to guide both agents. MALS is shown to converge and outperform Independent Dual Agent and CTDE baselines, with extensive analyses of how different weighting of DL/UL objectives affects performance and energy trade-offs. The approach yields practical benefits for edge-assisted AR/MAR gaming by improving fluidity, profitability, and battery life under mobility and NOMA-based transmissions. The work highlights the viability of a loss-sharing, asymmetrical MARL architecture for complex joint optimization in MEC-enabled wireless networks.
Abstract
The Metaverse play-to-earn games have been gaining popularity as they enable players to earn in-game tokens which can be translated to real-world profits. With the advancements in augmented reality (AR) technologies, users can play AR games in the Metaverse. However, these high-resolution games are compute-intensive, and in-game graphical scenes need to be offloaded from mobile devices to an edge server for computation. In this work, we consider an optimization problem where the Metaverse Service Provider (MSP)'s objective is to reduce downlink transmission latency of in-game graphics, the latency of uplink data transmission, and the worst-case (greatest) battery charge expenditure of user equipments (UEs), while maximizing the worst-case (lowest) UE resolution-influenced in-game earning potential through optimizing the downlink UE-Metaverse Base Station (UE-MBS) assignment and the uplink transmission power selection. The downlink and uplink transmissions are then executed asynchronously. We propose a multi-agent, loss-sharing (MALS) reinforcement learning model to tackle the asynchronous and asymmetric problem. We then compare the MALS model with other baseline models and show its superiority over other methods. Finally, we conduct multi-variable optimization weighting analyses and show the viability of using our proposed MALS algorithm to tackle joint optimization problems.
