Distributed Q-learning-based Shortest-Path Tree Construction in IoT Sensor Networks
Van-Vi Vo, Tien-Dung Nguyen, Duc-Tai Le, Hyunseung Choo
TL;DR
This work tackles the problem of constructing shortest-path trees (SPTs) in energy-constrained IoT sensor networks where centralized routing is impractical. It introduces a fully distributed Q-learning framework in which each node learns optimal next-hop decisions using only local neighbor information, with the objective expressed as $d_T(v,v_0)=d_G(v,v_0)$ and a sink-reaching reward structure. Theoretical analysis guarantees convergence to an optimal, loop-free SPT under standard tabular Q-learning conditions, and complexity is kept low through local communications and per-update costs of $O(|N^{V_m}(v)|)$. Extensive simulations on random geometric graphs show near-optimal routing accuracy (over 99% for $N\ge300$) and strong generalization across network sizes, indicating the approach is scalable, energy-efficient, and robust to topology changes. These results demonstrate a practical, distributed alternative to traditional centralized routing protocols for IoT deployments.
Abstract
Efficient routing in IoT sensor networks is critical for minimizing energy consumption and latency. Traditional centralized algorithms, such as Dijkstra's, are computationally intensive and ill-suited for dynamic, distributed IoT environments. We propose a novel distributed Q-learning framework for constructing shortest-path trees (SPTs), enabling sensor nodes to independently learn optimal next-hop decisions using only local information. States are defined based on node positions and routing history, with a reward function that incentivizes progression toward the sink while penalizing inefficient paths. Trained on diverse network topologies, the framework generalizes effectively to unseen networks. Simulations across 100 to 500 nodes demonstrate near-optimal routing accuracy (over 99% for networks with more than 300 nodes), with minor deviations (1-2 extra hops) in smaller networks having negligible impact on performance. Compared to centralized and flooding-based methods, our approach reduces communication overhead, adapts to topology changes, and enhances scalability and energy efficiency. This work underscores the potential of Q-learning for autonomous, robust routing in resource-constrained IoT networks, offering a scalable alternative to traditional protocols.
