Table of Contents
Fetching ...

Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints

Jiping Luo, Nikolaos Pappas

TL;DR

Numerical results show that continuous transmission is inefficient, and remarkably, semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.

Abstract

This paper studies the remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the \emph{semantics of information} and consider that the remote actuator has different tolerances for the estimation errors. We aim to find an optimal scheduling policy that minimizes the long-term \textit{state-dependent} costs of estimation errors under a transmission frequency constraint. The optimal scheduling problem is formulated as a \emph{constrained Markov decision process} (CMDP). We show that the optimal Lagrangian cost follows a piece-wise linear and concave (PWLC) function, and the optimal policy is, at most, a randomized mixture of two simple deterministic policies. By exploiting the structural results, we develop a new \textit{intersection search} algorithm that finds the optimal policy using only a few iterations. We further propose a reinforcement learning (RL) algorithm to compute the optimal policy without knowing \textit{a priori} the channel and source statistics. To avoid the ``curse of dimensionality" in MDPs, we propose an online low-complexity \textit{drift-plus-penalty} (DPP) algorithm. Numerical results show that continuous transmission is inefficient, and remarkably, our semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.

Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints

TL;DR

Numerical results show that continuous transmission is inefficient, and remarkably, semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.

Abstract

This paper studies the remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the \emph{semantics of information} and consider that the remote actuator has different tolerances for the estimation errors. We aim to find an optimal scheduling policy that minimizes the long-term \textit{state-dependent} costs of estimation errors under a transmission frequency constraint. The optimal scheduling problem is formulated as a \emph{constrained Markov decision process} (CMDP). We show that the optimal Lagrangian cost follows a piece-wise linear and concave (PWLC) function, and the optimal policy is, at most, a randomized mixture of two simple deterministic policies. By exploiting the structural results, we develop a new \textit{intersection search} algorithm that finds the optimal policy using only a few iterations. We further propose a reinforcement learning (RL) algorithm to compute the optimal policy without knowing \textit{a priori} the channel and source statistics. To avoid the ``curse of dimensionality" in MDPs, we propose an online low-complexity \textit{drift-plus-penalty} (DPP) algorithm. Numerical results show that continuous transmission is inefficient, and remarkably, our semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.
Paper Structure (30 sections, 10 theorems, 48 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 30 sections, 10 theorems, 48 equations, 7 figures, 1 table, 4 algorithms.

Key Result

Proposition 1

Suppose $\mathbf{Q}^m$ is irreducible and $0<f_m<F_{\max}$. Then $\mathbf{P}^m(\pi_\textrm{sa})$ forms a recurrent and aperiodic chain.

Figures (7)

  • Figure 1: Remote estimation of multiple Markov sources with feedback.
  • Figure 2: An illustration of the zero- and one-delay cases.
  • Figure 3: Schematic representation of the flow diagram of finding the constrained optimal policy of CMDPs. The RVI algorithm is applied to solve Lagrangian MDPs with a fixed $\lambda$. We will develop a Q-learning algorithm for unknown environments in Section \ref{['sec:drl']}. The Lagrangian multiplier is updated until the $\gamma$ value is found. Then, the constrained optimal policy is constructed as a mixture of two deterministic policies. The pseudo-code is summarized in Algorithm \ref{['alg:Sa-RVI']}.
  • Figure 4: An illustration of the Insec-RVI method. Consider that function $\mathcal{L}^\lambda$ has $4$ segments within the interval $[0, \lambda_{\max}]$ and $\gamma$ is the second corner of $\mathcal{L}^\lambda$. The red dots represent the intersection points at each iteration, while the black dots are the corresponding projected points onto the curve $\mathcal{L}^\lambda$.
  • Figure 5: Comparison of different policy search methods when $F_{\max} = 0.4$ and $p_s = 0.4$. The optimal Lagrangian multiplier is found at $\lambda = 10$.
  • ...and 2 more figures

Theorems & Definitions (27)

  • Remark 1
  • Remark 2
  • Definition 1
  • Definition 2
  • Proposition 1
  • proof
  • Corollary 1
  • Definition 3
  • Theorem 1
  • proof
  • ...and 17 more