MetaLore: Learning to Orchestrate Communication and Computation for Metaverse Synchronization
Elif Ebru Ohri, Qi Liao, Anastasios Giovanidis, Francesca Fossati, Nour-El-Houda Yellas
TL;DR
MetaLore tackles real-time synchronization in Metaverse/digital twin environments by jointly allocating communication bandwidth and computational resources using deep reinforcement learning. The framework introduces a lightweight, queue-length–based state and two novel AoI metrics, AoRI and AoSI, integrated into a PPO-based learning objective to balance throughput, end-to-end delay, and data freshness. The system is evaluated in a custom simulator against static baselines, demonstrating Pareto-optimal trade-offs and adaptability to dynamic traffic with a small observation space. The findings suggest significant practical impact for edge-enabled, low-latency immersive applications, and pave the way for multi-BS and inter-submetaverse resource sharing research.
Abstract
As augmented and virtual reality evolve, achieving seamless synchronization between physical and digital realms remains a critical challenge, especially for real-time applications where delays affect the user experience. This paper presents MetaLore, a Deep Reinforcement Learning (DRL) based framework for joint communication and computational resource allocation in Metaverse or digital twin environments. MetaLore dynamically shares the communication bandwidth and computational resources among sensors and mobile devices to optimize synchronization, while offering high throughput performance. Special treatment is given in satisfying end-to-end delay guarantees. A key contribution is the introduction of two novel Age of Information (AoI) metrics: Age of Request Information (AoRI) and Age of Sensor Information (AoSI), integrated into the reward function to enhance synchronization quality. An open source simulator has been extended to incorporate and evaluate the approach. The DRL solution is shown to achieve the performance of full-enumeration brute-force solutions by making use of a small, task-oriented observation space of two queue lengths at the network side. This allows the DRL approach the flexibility to effectively and autonomously adapt to dynamic traffic conditions.
