HGFF: A Deep Reinforcement Learning Framework for Lifetime Maximization in Wireless Sensor Networks
Xiaoxu Han, Xin Mu, Jinghui Zhong
TL;DR
HGFF tackles the NP-hard problem of maximizing wireless sensor network lifetime by learning sink trajectories with a deep reinforcement learning framework built on a heterogeneous graph neural network. It models the WSN as a weighted undirected graph with sensors and sites as distinct node types, enriches representations with learnable type embeddings, and fuses global sensor-site information through multi-head attention, all within a Double DQN training regime. The key innovations are the heterogeneous graph representation, learnable type embeddings, global attention-based feature fusion, and an end-to-end DRL pipeline that avoids heavy handcrafting. Across ten diverse maps, HGFF consistently outperforms heuristic, hyper-heuristic, RL-based, and MILP baselines in lifetime while maintaining efficient inference, highlighting its practical potential for real-world WSN management.
Abstract
Planning the movement of the sink to maximize the lifetime in wireless sensor networks is an essential problem of great research challenge and practical value. Many existing mobile sink techniques based on mathematical programming or heuristics have demonstrated the feasibility of the task. Nevertheless, the huge computation consumption or the over-reliance on human knowledge can result in relatively low performance. In order to balance the need for high-quality solutions with the goal of minimizing inference time, we propose a new framework combining heterogeneous graph neural network with deep reinforcement learning to automatically construct the movement path of the sink. Modeling the wireless sensor networks as heterogeneous graphs, we utilize the graph neural network to learn representations of sites and sensors by aggregating features of neighbor nodes and extracting hierarchical graph features. Meanwhile, the multi-head attention mechanism is leveraged to allow the sites to attend to information from sensor nodes, which highly improves the expressive capacity of the learning model. Based on the node representations, a greedy policy is learned to append the next best site in the solution incrementally. We design ten types of static and dynamic maps to simulate different wireless sensor networks in the real world, and extensive experiments are conducted to evaluate and analyze our approach. The empirical results show that our approach consistently outperforms the existing methods on all types of maps.
