Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning
Yi Hu, Jinhang Zuo, Bob Iannucci, Carlee Joe-Wong
TL;DR
The paper tackles the challenge of tracking environmental phenomena with power- and bandwidth-constrained IoT sensors by jointly optimizing when sensors should report data. It formulates an MDP for multi-sensor reporting and introduces EnvSen, a MARL framework that assigns sensor-specific rewards based on the value of data for improving beliefs, while incorporating transmission costs and channel limitations. The approach shows theoretically grounded baselines in simplified settings and demonstrates through wildfire-focused experiments (including LoRa simulations) that EnvSen can learn cooperative reporting policies that balance data value against energy use, closely approaching optimal performance under bandwidth constraints. The work advances practical, scalable sensing in resource-limited IoT networks and offers a foundation for applying data-value-guided MARL to broader environmental monitoring tasks.
Abstract
Internet of Things (IoT) technologies have enabled numerous data-driven mobile applications and have the potential to significantly improve environmental monitoring and hazard warnings through the deployment of a network of IoT sensors. However, these IoT devices are often power-constrained and utilize wireless communication schemes with limited bandwidth. Such power constraints limit the amount of information each device can share across the network, while bandwidth limitations hinder sensors' coordination of their transmissions. In this work, we formulate the communication planning problem of IoT sensors that track the state of the environment. We seek to optimize sensors' decisions in collecting environmental data under stringent resource constraints. We propose a multi-agent reinforcement learning (MARL) method to find the optimal communication policies for each sensor that maximize the tracking accuracy subject to the power and bandwidth limitations. MARL learns and exploits the spatial-temporal correlation of the environmental data at each sensor's location to reduce the redundant reports from the sensors. Experiments on wildfire spread with LoRA wireless network simulators show that our MARL method can learn to balance the need to collect enough data to predict wildfire spread with unknown bandwidth limitations.
