Table of Contents
Fetching ...

Play to Earn in the Metaverse with Mobile Edge Computing over Wireless Networks: A Deep Reinforcement Learning Approach

Terence Jie Chua, Wenhan Yu, Jun Zhao

TL;DR

The paper tackles the joint optimization of downlink latency, uplink latency, and worst-case UE battery expenditure, while maximizing worst-case earning potential in play-to-earn MAR games over mobile edge computing. It introduces Multi-Agent Loss-Sharing (MALS), an asymmetric, asynchronous reinforcement learning framework built on PPO, with discrete DL UE-MBS allocation and continuous UL power control, using a two-head critic to guide both agents. MALS is shown to converge and outperform Independent Dual Agent and CTDE baselines, with extensive analyses of how different weighting of DL/UL objectives affects performance and energy trade-offs. The approach yields practical benefits for edge-assisted AR/MAR gaming by improving fluidity, profitability, and battery life under mobility and NOMA-based transmissions. The work highlights the viability of a loss-sharing, asymmetrical MARL architecture for complex joint optimization in MEC-enabled wireless networks.

Abstract

The Metaverse play-to-earn games have been gaining popularity as they enable players to earn in-game tokens which can be translated to real-world profits. With the advancements in augmented reality (AR) technologies, users can play AR games in the Metaverse. However, these high-resolution games are compute-intensive, and in-game graphical scenes need to be offloaded from mobile devices to an edge server for computation. In this work, we consider an optimization problem where the Metaverse Service Provider (MSP)'s objective is to reduce downlink transmission latency of in-game graphics, the latency of uplink data transmission, and the worst-case (greatest) battery charge expenditure of user equipments (UEs), while maximizing the worst-case (lowest) UE resolution-influenced in-game earning potential through optimizing the downlink UE-Metaverse Base Station (UE-MBS) assignment and the uplink transmission power selection. The downlink and uplink transmissions are then executed asynchronously. We propose a multi-agent, loss-sharing (MALS) reinforcement learning model to tackle the asynchronous and asymmetric problem. We then compare the MALS model with other baseline models and show its superiority over other methods. Finally, we conduct multi-variable optimization weighting analyses and show the viability of using our proposed MALS algorithm to tackle joint optimization problems.

Play to Earn in the Metaverse with Mobile Edge Computing over Wireless Networks: A Deep Reinforcement Learning Approach

TL;DR

The paper tackles the joint optimization of downlink latency, uplink latency, and worst-case UE battery expenditure, while maximizing worst-case earning potential in play-to-earn MAR games over mobile edge computing. It introduces Multi-Agent Loss-Sharing (MALS), an asymmetric, asynchronous reinforcement learning framework built on PPO, with discrete DL UE-MBS allocation and continuous UL power control, using a two-head critic to guide both agents. MALS is shown to converge and outperform Independent Dual Agent and CTDE baselines, with extensive analyses of how different weighting of DL/UL objectives affects performance and energy trade-offs. The approach yields practical benefits for edge-assisted AR/MAR gaming by improving fluidity, profitability, and battery life under mobility and NOMA-based transmissions. The work highlights the viability of a loss-sharing, asymmetrical MARL architecture for complex joint optimization in MEC-enabled wireless networks.

Abstract

The Metaverse play-to-earn games have been gaining popularity as they enable players to earn in-game tokens which can be translated to real-world profits. With the advancements in augmented reality (AR) technologies, users can play AR games in the Metaverse. However, these high-resolution games are compute-intensive, and in-game graphical scenes need to be offloaded from mobile devices to an edge server for computation. In this work, we consider an optimization problem where the Metaverse Service Provider (MSP)'s objective is to reduce downlink transmission latency of in-game graphics, the latency of uplink data transmission, and the worst-case (greatest) battery charge expenditure of user equipments (UEs), while maximizing the worst-case (lowest) UE resolution-influenced in-game earning potential through optimizing the downlink UE-Metaverse Base Station (UE-MBS) assignment and the uplink transmission power selection. The downlink and uplink transmissions are then executed asynchronously. We propose a multi-agent, loss-sharing (MALS) reinforcement learning model to tackle the asynchronous and asymmetric problem. We then compare the MALS model with other baseline models and show its superiority over other methods. Finally, we conduct multi-variable optimization weighting analyses and show the viability of using our proposed MALS algorithm to tackle joint optimization problems.
Paper Structure (25 sections, 22 equations, 11 figures, 1 algorithm)

This paper contains 25 sections, 22 equations, 11 figures, 1 algorithm.

Figures (11)

  • Figure 1: System model illustrating the interaction between the UEs and the MBSs, agents' objectives and variables-of-control.
  • Figure 2: Reinforcement Learning Multi-Agent-Loss-Sharing Proximal Policy Optimization (MALS-PPO) structure and model update.
  • Figure 3: Independent Dual Agent (IDA) (left) and Centralized Training Decentralized Execution (CTDE) (right).
  • Figure 4: Key metrics performance obtained by MALS across training steps for the 4 MBS, 8 UE configuration.
  • Figure 5: Uplink and downlink agent rewards obtained for MALS model, Independent Dual Agent model, CTDE model, across configuration settings and seed 0 to 9. Bands within the graph indicate the range of reward values obtained across the seeds.
  • ...and 6 more figures