Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning

Yi Hu; Jinhang Zuo; Bob Iannucci; Carlee Joe-Wong

Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning

Yi Hu, Jinhang Zuo, Bob Iannucci, Carlee Joe-Wong

TL;DR

The paper tackles the challenge of tracking environmental phenomena with power- and bandwidth-constrained IoT sensors by jointly optimizing when sensors should report data. It formulates an MDP for multi-sensor reporting and introduces EnvSen, a MARL framework that assigns sensor-specific rewards based on the value of data for improving beliefs, while incorporating transmission costs and channel limitations. The approach shows theoretically grounded baselines in simplified settings and demonstrates through wildfire-focused experiments (including LoRa simulations) that EnvSen can learn cooperative reporting policies that balance data value against energy use, closely approaching optimal performance under bandwidth constraints. The work advances practical, scalable sensing in resource-limited IoT networks and offers a foundation for applying data-value-guided MARL to broader environmental monitoring tasks.

Abstract

Internet of Things (IoT) technologies have enabled numerous data-driven mobile applications and have the potential to significantly improve environmental monitoring and hazard warnings through the deployment of a network of IoT sensors. However, these IoT devices are often power-constrained and utilize wireless communication schemes with limited bandwidth. Such power constraints limit the amount of information each device can share across the network, while bandwidth limitations hinder sensors' coordination of their transmissions. In this work, we formulate the communication planning problem of IoT sensors that track the state of the environment. We seek to optimize sensors' decisions in collecting environmental data under stringent resource constraints. We propose a multi-agent reinforcement learning (MARL) method to find the optimal communication policies for each sensor that maximize the tracking accuracy subject to the power and bandwidth limitations. MARL learns and exploits the spatial-temporal correlation of the environmental data at each sensor's location to reduce the redundant reports from the sensors. Experiments on wildfire spread with LoRA wireless network simulators show that our MARL method can learn to balance the need to collect enough data to predict wildfire spread with unknown bandwidth limitations.

Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning

TL;DR

Abstract

Paper Structure (26 sections, 5 theorems, 15 equations, 4 figures, 1 algorithm)

This paper contains 26 sections, 5 theorems, 15 equations, 4 figures, 1 algorithm.

Introduction
Motivating Example: Wildfire Tracking
Research Challenges
Our Contributions
Related Work
Problem Formulation
Tracking Environmental Conditions
Energy Cost and Communication Limitation
Communication Planning under Constraints
Defining the Data Value
Communication Planning
MDP Formulation
State
Action
Reward
...and 11 more sections

Key Result

Proposition 1

Algorithm alg:optimal yields the optimal communication policy if all sensors have the same communication cost $wc^i$.

Figures (4)

Figure 1: System design for wildfire tracking. Multiple IoT sensors periodically monitor the environment and send sensed data to a central gateway over LPWAN. The gateway uses its received data to predict future wildfire spread.
Figure 2: EnvSen Framework for sensors' communication decisions.
Figure 3: Experimental results. Columns from left to right: the average reward $v-wc$ with $w=0.2$/convergence of the policies, the sum of the data value, the total communication cost of all sensors, and the error loss of the belief for tracking wildfire, averaged across 30 random episodes.
Figure 4: (a): The error loss for tracking wildfire of the RL algorithms that directly use the tracking accuracy as the reward and those (EnvSen-X) that adopt our data metric. (b): the trade-off curve between data value and communication cost made by different policies.

Theorems & Definitions (5)

Proposition 1
Corollary 1
Proposition 2
Corollary 2
Corollary 3

Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning

TL;DR

Abstract

Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)