Transferable Graphical MARL for Real-Time Estimation in Dynamic Wireless Networks

Xingran Chen; Navid NaderiAlizadeh; Alejandro Ribeiro; Shirin Saeedi Bidokhti

Transferable Graphical MARL for Real-Time Estimation in Dynamic Wireless Networks

Xingran Chen, Navid NaderiAlizadeh, Alejandro Ribeiro, Shirin Saeedi Bidokhti

TL;DR

The paper tackles real-time sampling and remote estimation in dynamic, multi-hop wireless networks and proposes a transferable graphical MARL framework with a Graph Recurrent Neural Network (GRNN) actor and a graph-based critic, underpinned by graphon theory to guarantee transferability across structurally similar graphs. It shows that, for oblivious policies, minimizing time-average estimation error $L^{\pi}$ is equivalent to minimizing AoI, enabling a unified optimization objective. The framework achieves scale-invariant, transferable policies whose performance improves with network size and under recurrence, and theoretical results bound the transferability error via graph sampling, signal sampling, and filter Lipschitz properties. Extensive experiments on synthetic and real networks demonstrate superior performance over baselines, strong cross-scale transferability, and enhanced robustness to non-stationarity when recurrence is used.

Abstract

We study real-time sampling and estimation of autoregressive Markovian sources in decentralized and dynamic multi-hop networks that share similar structures. Nodes cache neighboring samples and communicate over wireless collision channels. The objective is to minimize the time-average estimation error and/or the age of information under decentralized policies, which we address by developing a unified graphical multi-agent reinforcement learning framework. A key feature of the framework is its transferability, enabled by the fact that the number of trainable parameters is independent of the number of agents, allowing a learned policy to be directly deployed on dynamic yet structurally similar graphs without re-training. Building on this design, we establish rigorous theoretical guarantees on the transferability of the resulting policies. Numerical experiments demonstrate that (i) our method outperforms state-of-the-art baselines on dynamic graphs; (ii) the trained policies transfer well to larger networks, with performance gains increasing with the number of nodes; and (iii) incorporating recurrence is crucial, enhancing resilience to non-stationarity in both independent learning and centralized training with decentralized execution.

Transferable Graphical MARL for Real-Time Estimation in Dynamic Wireless Networks

TL;DR

is equivalent to minimizing AoI, enabling a unified optimization objective. The framework achieves scale-invariant, transferable policies whose performance improves with network size and under recurrence, and theoretical results bound the transferability error via graph sampling, signal sampling, and filter Lipschitz properties. Extensive experiments on synthetic and real networks demonstrate superior performance over baselines, strong cross-scale transferability, and enhanced robustness to non-stationarity when recurrence is used.

Abstract

Paper Structure (32 sections, 5 theorems, 83 equations, 6 figures, 2 tables)

This paper contains 32 sections, 5 theorems, 83 equations, 6 figures, 2 tables.

Introduction
System Model
Optimization Objectives and Policies
Estimation Error and AoI
Preliminaries
Dec-POMDP and Reinforcement Learning
Graph Recurrent Neural Networks
Graphons
Graphon Recurrent Neural Networks
Proposed Graphical MARL Framework
Framework
State and Observations
Nodes' Actions
Rewards
Updating Process
...and 17 more sections

Key Result

Lemma 1

Under oblivious policies, the expected estimation error for process $j$ at node $i$ is proportional to the expected AoI:

Figures (6)

Figure 1: Blue squares indicate packets sampled by the node itself, while yellow squares represent packets received from other nodes. At the shown slot, nodes $1$, $3$, $4$, $5$, $6$, and $7$ attempt transmissions. Collisions occur between nodes $4$ and $7$, and between nodes $5$ and $6$.
Figure 2: An example trajectory of $h_{j, k}^{(i)}$ is shown: it drops at slots $4$ and $12$ when fresh packets are received from nodes $j_1$ and $j_3$, respectively, and increases at slots 7 and 17 when the received packets from nodes $j_2$ and $j_4$ are stale.
Figure 3: The proposed graphical reinforcement learning framework.
Figure 4: Performance comparison between the proposed policies and baselines.
Figure 5: Transferability of proposed policies. The policies are trained on $10$-node networks and tested on networks with $M\in[10, 50]$ nodes.
...and 1 more figures

Theorems & Definitions (15)

Definition 1
Lemma 1
proof
Definition 2
Definition 3: Similar graphs and signals
Definition 4
Definition 5
Theorem 1: transferability in GRNNs
proof
Theorem 2: Transferability in action distributions
...and 5 more

Transferable Graphical MARL for Real-Time Estimation in Dynamic Wireless Networks

TL;DR

Abstract

Transferable Graphical MARL for Real-Time Estimation in Dynamic Wireless Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (15)