Table of Contents
Fetching ...

MDP-based Energy-aware Task Scheduling for Battery-less IoT

Shahab Jahanbazi, Mateen Ashraf, Onel L. A. López

TL;DR

This work addresses task scheduling for battery-less IoT devices under stochastic energy harvesting by formulating the problem as a Markov decision process (MDP). It derives an optimal stationary threshold-based (OSTB) policy that maximizes long-term task completion while minimizing power failures, using a two-layer transition model (micro and macro) and two reward schemes (basic and sigmoid-based). A unichain property is proven and the policy structure is shown to be threshold-based in capacitor voltage within each superstate, enabling efficient computation via linear programming. Simulation demonstrates that OSTB outperforms the ALAP baseline in task completion rate, power-failure reduction, and latency, with larger capacitors reducing the gap, indicating practical benefits for ultra-low-power, energy-harvesting IoT deployments.

Abstract

Realizing high long-term task completion rates represents a fundamental challenge in battery-less Internet of Things (IoT) devices powered by ambient energy harvesting. This difficulty is primarily due to the stochastic and time-varying characteristics of the available energy, which significantly complicate the design of optimal task scheduling policies. In this paper, we consider a battery-less IoT device that must periodically report sensing measurements to a monitoring center. We adopt the Markov decision process (MDP) framework to handle energy variability while aiming to maximize the long-term task completion rate. For this, we first identify its components and then define two appropriate reward functions. We demonstrate the inherent properties associated with the MDP formulation and the related optimal policy. Subsequently, we solve the resulting optimization problem, leading to the optimal stationary threshold-based (OSTB) scheduling. Simulation results demonstrate that OSTB outperforms the well-known ``as late as possible'' (ALAP) scheduling strategy. For instance, an $8.6\%$ increase in the task completion rate, along with a $65\%$ reduction in power failures and a $86.29\%$ decrease in execution delays during task execution are registered assuming a $4.7$ mF capacitor.

MDP-based Energy-aware Task Scheduling for Battery-less IoT

TL;DR

This work addresses task scheduling for battery-less IoT devices under stochastic energy harvesting by formulating the problem as a Markov decision process (MDP). It derives an optimal stationary threshold-based (OSTB) policy that maximizes long-term task completion while minimizing power failures, using a two-layer transition model (micro and macro) and two reward schemes (basic and sigmoid-based). A unichain property is proven and the policy structure is shown to be threshold-based in capacitor voltage within each superstate, enabling efficient computation via linear programming. Simulation demonstrates that OSTB outperforms the ALAP baseline in task completion rate, power-failure reduction, and latency, with larger capacitors reducing the gap, indicating practical benefits for ultra-low-power, energy-harvesting IoT deployments.

Abstract

Realizing high long-term task completion rates represents a fundamental challenge in battery-less Internet of Things (IoT) devices powered by ambient energy harvesting. This difficulty is primarily due to the stochastic and time-varying characteristics of the available energy, which significantly complicate the design of optimal task scheduling policies. In this paper, we consider a battery-less IoT device that must periodically report sensing measurements to a monitoring center. We adopt the Markov decision process (MDP) framework to handle energy variability while aiming to maximize the long-term task completion rate. For this, we first identify its components and then define two appropriate reward functions. We demonstrate the inherent properties associated with the MDP formulation and the related optimal policy. Subsequently, we solve the resulting optimization problem, leading to the optimal stationary threshold-based (OSTB) scheduling. Simulation results demonstrate that OSTB outperforms the well-known ``as late as possible'' (ALAP) scheduling strategy. For instance, an increase in the task completion rate, along with a reduction in power failures and a decrease in execution delays during task execution are registered assuming a mF capacitor.

Paper Structure

This paper contains 23 sections, 2 theorems, 21 equations, 7 figures, 3 tables.

Key Result

Theorem 1

Let $\pi$ be an arbitrary stationary deterministic policy. Moreover, let the transition matrices $A_{l}$, $A_s$, and $A_t$ be time-invariant and have strictly positive elements. Then, the MC induced by $\pi$ contains exactly one recurrent class (possibly along with a set of transient states).

Figures (7)

  • Figure 1: System model: a battery-less IoT device powered by ambient EH reports sensing measurements to a monitoring center. The concept of permissible time windows for tasks and the changes in voltage across the capacitor during the sensing task are illustrated.
  • Figure 2: An illustrative representation of the scheme described in Section \ref{['ex:superstates']}, assuming $M=8$, $N_v=3$, $d_s=3$, and $n_s=3$. Fig. \ref{['fig:superstate']} (left) shows the forward progress of the process, including the defined superstates. Fig. \ref{['fig:superstate']} (right) presents the general form of the transition matrix, where all possible action choices are considered. Each part in this matrix corresponds to a sub-matrix associated with a specific action that leads to a transition to another state in a different superstate. In particular, for rows corresponding to superstates with $(\tau\in\{0,1,2,3 \},f=1)$, the policy selects the appropriate rows from either $A_{l}$ or $A_s$, depending on the action taken.
  • Figure 3: Reward values from \ref{['eq:rewarddefinn2']} and \ref{['eq:rewarddefin2']} with $\beta=15$ and $\theta \in {0.7,0.9,0.95}$ as functions of voltage for states in $\mathcal{S}_1$ with action $\texttt{sensing}$ (top) and states in $\mathcal{S}_2$ with action $\texttt{transmitting}$ (bottom), with $C=2.7\text{mF}$, $V_{out}=2.4\text{V}$, and $\mathbf{i}_h$ has an i.i.d. uniform distribution as $\mathcal{U}[0,3]\text{mA}$.
  • Figure 4: The average expected reward defined in equation \ref{['eq:objectfun']} versus the execution times of sensing, while the execution time for transmitting task set $n_{t}\Delta t=0.8$ s (top) and versus the execution times of transmitting, while the execution time for sensing task set $n_s\Delta t=0.1$ s (bottom). For this analysis, the reward function is specified with $\beta = 25$, and the parameter $\theta$ is considered at three levels: $0.7$, $0.9$, and $0.95$.
  • Figure 5: Threshold voltage versus the permissible time window to execute sensing (top) and to execute transmitting (bottom), for $d_s=15$, $n_s=5$, $n_{t}=20$, and two distinct uniform distributions for the harvested current as $\mathcal{U}[0,0.3]$ mA and $\mathcal{U}[0,3]$ mA.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Remark 1
  • Theorem 2
  • proof
  • Remark 2