Improved Q-learning based Multi-hop Routing for UAV-Assisted Communication
N P Sharvari, Dibakar Das, Jyotsna Bapat, Debabrata Das
TL;DR
The paper tackles robust UAV-assisted routing under dynamic topologies and energy constraints by introducing IQMR, a novel $Q(\\lambda)$-learning based multi-hop routing algorithm that operates without predefined UAV trajectories. It combines four modules—neighbour discovery, energy/reliability estimation, UAV mode configuration, and a $Q(\\lambda)$-learning routing decision framework with adaptive learning rate and discount factor—to jointly optimize energy efficiency, connectivity, and collision avoidance. Key contributions include detailed estimation models for residual energy, packet reception, coverage, and collision; an adaptive UAV operational mode; a state–action–reward formulation with prioritized rewards; and a flexible $Q(\\lambda)$-learning implementation with dynamic hyperparameters. Simulation results show IQMR achieving approximately 32–36% improvements in energy efficiency and 25–32% gains in data throughput over baseline Q-learning UAV routing protocols, with convergence in under 500 episodes, highlighting its practical impact for dynamic aerial networks.
Abstract
Designing effective Unmanned Aerial Vehicle(UAV)-assisted routing protocols is challenging due to changing topology, limited battery capacity, and the dynamic nature of communication environments. Current protocols prioritize optimizing individual network parameters, overlooking the necessity for a nuanced approach in scenarios with intermittent connectivity, fluctuating signal strength, and varying network densities, ultimately failing to address aerial network requirements comprehensively. This paper proposes a novel, Improved Q-learning-based Multi-hop Routing (IQMR) algorithm for optimal UAV-assisted communication systems. Using Q(λ) learning for routing decisions, IQMR substantially enhances energy efficiency and network data throughput. IQMR improves system resilience by prioritizing reliable connectivity and inter-UAV collision avoidance while integrating real-time network status information, all in the absence of predefined UAV path planning, thus ensuring dynamic adaptability to evolving network conditions. The results validate IQMR's adaptability to changing system conditions and superiority over the current techniques. IQMR showcases 36.35\% and 32.05\% improvements in energy efficiency and data throughput over the existing methods.
