Table of Contents
Fetching ...

An MDP Model for Censoring in Harvesting Sensors: Optimal and Approximated Solutions

Jesus Fernandez-Bes, Jesus Cid-Sueiro, Antonio G. Marques

TL;DR

This paper studies censoring policies for energy-harvesting wireless sensors by formulating the problem as an infinite-horizon Markov Decision Process and proving that, under reasonable battery dynamics, the optimal policy is a battery-dependent threshold on message importance. It introduces model-based stochastic approximations (SAP) that leverage this threshold structure and show faster convergence and lower complexity than Q-learning, with robustness to non-stationarities. The authors derive analytical characterizations of the optimal policy, discuss steady-state battery distributions, and present practical algorithms including an adaptive balanced transmitter. Numerical experiments in both single-hop and multi-hop networks demonstrate that energy-dependent censoring policies can significantly outperform balanced and non-selective approaches, especially when harvested energy is scarce. The work provides scalable, implementable strategies for energy-aware censoring in dynamic wireless sensor networks, with potential impact on green communications and distributed sensing applications.

Abstract

In this paper, we propose a novel censoring policy for energy-efficient transmissions in energy-harvesting sensors. The problem is formulated as an infinite-horizon Markov Decision Process (MDP). The objective to be optimized is the expected sum of the importance (utility) of all transmitted messages. Assuming that such importance can be evaluated at the transmitting node, we show that, under certain conditions on the battery model, the optimal censoring policy is a threshold function on the importance value. Specifically, messages are transmitted only if their importance is above a threshold whose value depends on the battery level. Exploiting this property, we propose a model-based stochastic scheme that approximates the optimal solution, with less computational complexity and faster convergence speed than a conventional Q-learning algorithm. Numerical experiments in single-hop and multi-hop networks confirm the analytical advantages of the proposed scheme.

An MDP Model for Censoring in Harvesting Sensors: Optimal and Approximated Solutions

TL;DR

This paper studies censoring policies for energy-harvesting wireless sensors by formulating the problem as an infinite-horizon Markov Decision Process and proving that, under reasonable battery dynamics, the optimal policy is a battery-dependent threshold on message importance. It introduces model-based stochastic approximations (SAP) that leverage this threshold structure and show faster convergence and lower complexity than Q-learning, with robustness to non-stationarities. The authors derive analytical characterizations of the optimal policy, discuss steady-state battery distributions, and present practical algorithms including an adaptive balanced transmitter. Numerical experiments in both single-hop and multi-hop networks demonstrate that energy-dependent censoring policies can significantly outperform balanced and non-selective approaches, especially when harvested energy is scarce. The work provides scalable, implementable strategies for energy-aware censoring in dynamic wireless sensor networks, with potential impact on green communications and distributed sensing applications.

Abstract

In this paper, we propose a novel censoring policy for energy-efficient transmissions in energy-harvesting sensors. The problem is formulated as an infinite-horizon Markov Decision Process (MDP). The objective to be optimized is the expected sum of the importance (utility) of all transmitted messages. Assuming that such importance can be evaluated at the transmitting node, we show that, under certain conditions on the battery model, the optimal censoring policy is a threshold function on the importance value. Specifically, messages are transmitted only if their importance is above a threshold whose value depends on the battery level. Exploiting this property, we propose a model-based stochastic scheme that approximates the optimal solution, with less computational complexity and faster convergence speed than a conventional Q-learning algorithm. Numerical experiments in single-hop and multi-hop networks confirm the analytical advantages of the proposed scheme.

Paper Structure

This paper contains 21 sections, 45 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: A graphical model relating the main variables in the MDP.
  • Figure 2: (a) Optimal thresholds for a harvesting node with $B=100$, stochastic energy costs, a unit-mean exponential importance distribution and $\gamma = 0.999$, for different values of $\overline{c}_0$. (b) The value function $\lambda(e)$.
  • Figure 3: Optimal thresholds for a harvesting node with ${\overline{c}_0}=-3.4$, ${\overline{c}_T=4}$, exponential importance distribution and $\gamma = 0.999$, for different values of the battery size. The horizontal dotted line shows the constant threshold value balancing the average energy consumption with the recharging rate.
  • Figure 4: Expected performance for a scenario with stochastic energy costs and high refill probability $p_b =1/3$, as a function of ${\overline{c}_0}$. Battery size $B=100$ and exponential distribution with unit mean and $\gamma = 0.999$.
  • Figure 5: Expected performance for a scenario with stochastic energy costs and low refill probability $p_b=0.04$, as a function of ${\overline{c}_0}$. Battery size $B=100$ and exponential distribution with unit mean and $\gamma = 0.999$.
  • ...and 6 more figures