UAV Trajectory Optimization via Improved Noisy Deep Q-Network

Zhang Hengyu; Maryam Cheraghy; Liu Wei; Armin Farhadi; Meysam Soltanpour; Zhong Zhuoqing

UAV Trajectory Optimization via Improved Noisy Deep Q-Network

Zhang Hengyu, Maryam Cheraghy, Liu Wei, Armin Farhadi, Meysam Soltanpour, Zhong Zhuoqing

TL;DR

This paper addresses UAV trajectory optimization under communication constraints by formulating a discrete, grid-based navigation task and solving it with an Improved Noisy DQN. The method blends residual NoisyLinear layers with adaptive noise scheduling, Double DQN target estimation, and soft target updates to enhance exploration, training stability, and sample efficiency. Key contributions include a learnable-noise network architecture, a performance-aware noise schedule with periodic resampling, and a comprehensive evaluation showing faster convergence, higher rewards, and fewer steps than standard DQN variants. The work advances practical, robust RL-based UAV navigation in cluttered, signal-limited environments with implications for real-time planning and reliability in communication-constrained scenarios.

Abstract

This paper proposes an Improved Noisy Deep Q-Network (Noisy DQN) to enhance the exploration and stability of Unmanned Aerial Vehicle (UAV) when applying deep reinforcement learning in simulated environments. This method enhances the exploration ability by combining the residual NoisyLinear layer with an adaptive noise scheduling mechanism, while improving training stability through smooth loss and soft target network updates. Experiments show that the proposed model achieves faster convergence and up to $+40$ higher rewards compared to standard DQN and quickly reach to the minimum number of steps required for the task 28 in the 15 * 15 grid navigation environment set up. The results show that our comprehensive improvements to the network structure of NoisyNet, exploration control, and training stability contribute to enhancing the efficiency and reliability of deep Q-learning.

UAV Trajectory Optimization via Improved Noisy Deep Q-Network

TL;DR

Abstract

higher rewards compared to standard DQN and quickly reach to the minimum number of steps required for the task 28 in the 15 * 15 grid navigation environment set up. The results show that our comprehensive improvements to the network structure of NoisyNet, exploration control, and training stability contribute to enhancing the efficiency and reliability of deep Q-learning.

Paper Structure (16 sections, 24 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 24 equations, 4 figures, 1 table, 1 algorithm.

Introduction
System Model
Problem Formulation
Methodology: Improved Noisy Deep Q-Network
Solution via Reinforcement Learning
DQN Framework
Network Architecture with Learnable Noise
Adaptive Noise Scheduling Mechanism
Target Value Estimation with Double DQN
Soft Target Network Updates
Learning Rate Warm-Up and Cosine Annealing
Training Algorithm
Performance Metrics
Simulation Results
Conclusions
...and 1 more sections

Figures (4)

Figure 1: Obstacle distribution map in the $15 \times 15$ UAV navigation environment.
Figure 2: Improved Noisy DQN Network Architecture.
Figure 3: Learning Curve for Reward Comparison Between Noisy DQN and Other Variations.
Figure 4: Number of Steps Taken Per Episode.

UAV Trajectory Optimization via Improved Noisy Deep Q-Network

TL;DR

Abstract

UAV Trajectory Optimization via Improved Noisy Deep Q-Network

Authors

TL;DR

Abstract

Table of Contents

Figures (4)