Table of Contents
Fetching ...

Learning Optimal Scheduling Policy for Remote State Estimation under Uncertain Channel Condition

Shuang Wu, Xiaoqiang Ren, Qing-Shan Jia, Karl Henrik Johansson, Ling Shi

TL;DR

The paper studies optimal sensor scheduling for remote state estimation when the channel dropout rate $r_s$ is unknown. It shows that the $Q$-factor is monotone and submodular, leading to threshold-like (and randomized-threshold) optimal policies under costly and constrained communication, respectively. To handle unknown channels, it develops two complementary learning frameworks: (i) stochastic approximation-based Q-learning with structural enhancements and (ii) parameter learning that estimates $r_s$ and plugs it into analytic policy formulas, with rigorous convergence guarantees for both. Numerical experiments demonstrate faster convergence with structured Q-learning, adaptability to time-varying channels, and favorable trade-offs compared to direct parameter-based control. Collectively, the work provides scalable, structure-exploiting methods for remote state estimation with uncertain channels and lays groundwork for extension to more complex channel models or multi-sensor setups.

Abstract

We consider optimal sensor scheduling with unknown communication channel statistics. We formulate two types of scheduling problems with the communication rate being a soft or hard constraint, respectively. We first present some structural results on the optimal scheduling policy using dynamic programming and assuming the channel statistics is known. We prove that the Q-factor is monotonic and submodular, which leads to the threshold-like structures in both types of problems. Then we develop a stochastic approximation and parameter learning frameworks to deal with the two scheduling problems with unknown channel statistics. We utilize their structures to design specialized learning algorithms. We prove the convergence of these algorithms. Performance improvement compared with the standard Q-learning algorithm is shown through numerical examples.

Learning Optimal Scheduling Policy for Remote State Estimation under Uncertain Channel Condition

TL;DR

The paper studies optimal sensor scheduling for remote state estimation when the channel dropout rate is unknown. It shows that the -factor is monotone and submodular, leading to threshold-like (and randomized-threshold) optimal policies under costly and constrained communication, respectively. To handle unknown channels, it develops two complementary learning frameworks: (i) stochastic approximation-based Q-learning with structural enhancements and (ii) parameter learning that estimates and plugs it into analytic policy formulas, with rigorous convergence guarantees for both. Numerical experiments demonstrate faster convergence with structured Q-learning, adaptability to time-varying channels, and favorable trade-offs compared to direct parameter-based control. Collectively, the work provides scalable, structure-exploiting methods for remote state estimation with uncertain channels and lays groundwork for extension to more complex channel models or multi-sensor setups.

Abstract

We consider optimal sensor scheduling with unknown communication channel statistics. We formulate two types of scheduling problems with the communication rate being a soft or hard constraint, respectively. We first present some structural results on the optimal scheduling policy using dynamic programming and assuming the channel statistics is known. We prove that the Q-factor is monotonic and submodular, which leads to the threshold-like structures in both types of problems. Then we develop a stochastic approximation and parameter learning frameworks to deal with the two scheduling problems with unknown channel statistics. We utilize their structures to design specialized learning algorithms. We prove the convergence of these algorithms. Performance improvement compared with the standard Q-learning algorithm is shown through numerical examples.

Paper Structure

This paper contains 31 sections, 15 theorems, 108 equations, 8 figures.

Key Result

Lemma 1

If $\rho^2(A)(1-r_s)<1$, there exists a stationary policy $f^\star\in\mathbb{F}^S$ such that $a=f^\star(\tau)$ solves the Bellman optimality equation: where $\mathcal{J}^\star$ is the optimal value of the trace of the average estimation error.

Figures (8)

  • Figure 1: System architecture.
  • Figure 2: Relation among state $\tau(k)$, action $a(k)$ and transmission result $\eta(k)$.
  • Figure 3: $Q$-factor in the learning process for Problem 1.
  • Figure 4: Average estimation error in Problem 1.
  • Figure 5: Communication rate in Problem 2.
  • ...and 3 more figures

Theorems & Definitions (23)

  • Remark 1
  • Lemma 1
  • Lemma 2: Monotonicity of $V(\cdot)$
  • Lemma 3: Monotonicity of $Q(\cdot,a)$
  • Lemma 4: Submodularity of $Q(\cdot,\cdot)$
  • Theorem 1: Costly communication
  • Remark 2
  • Lemma 5
  • Theorem 2: Constrained communication
  • Corollary 1
  • ...and 13 more