Hierarchical JEPA Meets Predictive Remote Control in Beyond 5G Networks
Abanoub M. Girgis, Ibtissam Labriji, Mehdi Bennis
TL;DR
This work addresses scalable predictive remote control in bandwidth-limited wireless networks by encoding high-dimensional device observations into low-dimensional embeddings and predicting future embeddings across three temporal levels. The Hierarchical Joint-Embedding Predictive Architecture (H-JEPA) uses a context encoder, a periodically updated target encoder via exponential moving average, and three auto-regressive predictors (high-, medium-, low-level) to forecast embeddings over a horizon $K_p$, with losses based on cosine similarity to alignment targets. A semantic actor converts predicted embeddings into control actions, avoiding high-dimensional state reconstruction and enabling robust long-horizon control. Simulation on inverted cart-pole tasks shows that H-JEPA can support up to $42.83\%$ more devices under $20$ dB SNR while maintaining control performance, highlighting significant communication-efficiency gains for Beyond-5G networked control systems.
Abstract
In wireless networked control systems, ensuring timely and reliable state updates from distributed devices to remote controllers is essential for robust control performance. However, when multiple devices transmit high-dimensional states (e.g., images or video frames) over bandwidth-limited wireless networks, a critical trade-off emerges between communication efficiency and control performance. To address this challenge, we propose a Hierarchical Joint-Embedding Predictive Architecture (H-JEPA) for scalable predictive control. Instead of transmitting states, device observations are encoded into low-dimensional embeddings that preserve essential dynamics. The proposed architecture employs a three-level hierarchical prediction, with high-level, medium-level, and low-level predictors operating across different temporal resolutions, to achieve long-term prediction stability, intermediate interpolation, and fine-grained refinement, respectively. Control actions are derived within the embedding space, removing the need for state reconstruction. Simulation results on inverted cart-pole systems demonstrate that H-JEPA enables up to 42.83 % more devices to be supported under limited wireless capacity without compromising control performance.
