Transferable Graphical MARL for Real-Time Estimation in Dynamic Wireless Networks
Xingran Chen, Navid NaderiAlizadeh, Alejandro Ribeiro, Shirin Saeedi Bidokhti
TL;DR
The paper tackles real-time sampling and remote estimation in dynamic, multi-hop wireless networks and proposes a transferable graphical MARL framework with a Graph Recurrent Neural Network (GRNN) actor and a graph-based critic, underpinned by graphon theory to guarantee transferability across structurally similar graphs. It shows that, for oblivious policies, minimizing time-average estimation error $L^{\pi}$ is equivalent to minimizing AoI, enabling a unified optimization objective. The framework achieves scale-invariant, transferable policies whose performance improves with network size and under recurrence, and theoretical results bound the transferability error via graph sampling, signal sampling, and filter Lipschitz properties. Extensive experiments on synthetic and real networks demonstrate superior performance over baselines, strong cross-scale transferability, and enhanced robustness to non-stationarity when recurrence is used.
Abstract
We study real-time sampling and estimation of autoregressive Markovian sources in decentralized and dynamic multi-hop networks that share similar structures. Nodes cache neighboring samples and communicate over wireless collision channels. The objective is to minimize the time-average estimation error and/or the age of information under decentralized policies, which we address by developing a unified graphical multi-agent reinforcement learning framework. A key feature of the framework is its transferability, enabled by the fact that the number of trainable parameters is independent of the number of agents, allowing a learned policy to be directly deployed on dynamic yet structurally similar graphs without re-training. Building on this design, we establish rigorous theoretical guarantees on the transferability of the resulting policies. Numerical experiments demonstrate that (i) our method outperforms state-of-the-art baselines on dynamic graphs; (ii) the trained policies transfer well to larger networks, with performance gains increasing with the number of nodes; and (iii) incorporating recurrence is crucial, enhancing resilience to non-stationarity in both independent learning and centralized training with decentralized execution.
