Goal-oriented Estimation of Multiple Markov Sources in Resource-constrained Systems
Jiping Luo, Nikolaos Pappas
TL;DR
We address remote estimation of $M$ Markov sources in resource-constrained networks with a one-slot transmission delay and introduce the cost of actuation error (CAE) as the key performance metric. The problem is formulated as an average-cost CMDP and tackled via two transformations: Lagrangian relaxation yielding a lambda-optimal policy and Lyapunov drift converting the constraint into a virtual-queue stability problem, leading to a drift-plus-penalty objective. Two policies are developed: a low-complexity DPP policy for known statistics and a model-free LO-DRL policy based on PPO for unknown environments, both leveraging the one-slot expected CAE to decide sampling actions. Simulations demonstrate substantial CAE reductions and suppressed uninformative transmissions across single and multiple source setups, with LO-DRL delivering superior performance in uncertain or poor-channel conditions. The approach offers a practical, semantics-aware framework for resource-limited networked control systems with broad applicability to remote actuation tasks.
Abstract
This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent decides the updating times of the sources and transmits the packet to a remote destination over an unreliable channel with delay. The destination is tasked with source reconstruction for actuation. We utilize the metric \textit{cost of actuation error} (CAE) to capture the state-dependent actuation costs. We aim for a sampling policy that minimizes the long-term average CAE subject to an average resource constraint. We formulate this problem as an average-cost constrained Markov Decision Process (CMDP) and relax it into an unconstrained problem by utilizing \textit{Lyapunov drift} techniques. Then, we propose a low-complexity \textit{drift-plus-penalty} (DPP) policy for systems with known source/channel statistics and a Lyapunov optimization-based deep reinforcement learning (LO-DRL) policy for unknown environments. Our policies significantly reduce the number of uninformative transmissions by exploiting the timing of the important information.
