MAPPO for Edge Server Monitoring
Samuel Chamoun, Christian McDowell, Robin Buchanan, Kevin Chan, Eric Graves, Yin Sun
TL;DR
The paper addresses goal-oriented communication for edge server monitoring with $N$ dispatchers and $K$ shared edge servers, where server availability is time-varying and information freshness matters. It formalizes the problem as a cooperative MA-POMDP with AoI-aware observations and two feedback channels (active queries and job acknowledgments), and solves it using a MAPPO framework with centralized training and decentralized execution. The MAPPO solution learns decentralized query-and-dispatch policies that maximize long-term throughput while penalizing query overhead, outperforming several baselines across different query costs, arrival rates, and system scale. The findings demonstrate that joint optimization of querying and dispatching under partial observability yields significant improvements in throughput-cost tradeoffs, with practical implications for real-time edge monitoring in dynamic environments.
Abstract
In this paper, we consider a goal-oriented communication problem for edge server monitoring, where jobs arrive intermittently at multiple dispatchers and must be assigned to shared edge servers with finite queues and time-varying availability. Accurate knowledge of server status is critical for sustaining high throughput, yet remains challenging under dynamic workloads and partial observability. To address this challenge, each dispatcher maintains server knowledge through two complementary mechanisms: (i) active status queries that provide instantaneous updates at a communication cost, and (ii) job execution feedback that reveals server conditions upon successful or failed job completion. We formulate a cooperative multi-agent distributed decision-making problem in which dispatchers jointly optimize query scheduling to balance throughput against communication overhead. To solve this problem, we propose a Multi-Agent Proximal Policy Optimization (MAPPO)-based algorithm that leverages centralized training with decentralized execution (CTDE) to learn distributed query-and-dispatch policies under partial and stale observations. Experiments show that MAPPO achieves superior throughput-cost tradeoffs and significantly outperforms baseline strategies across varying query costs, job arrival rates, and dispatchers.
