Perturbation-mitigated USV Navigation with Distributionally Robust Reinforcement Learning
Zhaofan Zhang, Minghao Yang, Sihong Xie, Hui Xiong
TL;DR
This paper tackles robust USV navigation under heteroscedastic observational noise by introducing DRIQN, a framework that unifies Distributionally Robust Optimization with Implicit Quantile Networks and a gradient-substitution mechanism. It leverages a replay buffer partitioned into noise-pattern subgroups to address multiple environmental noise sources, formulating a tractable dual quadratic program over subgroup gradients. Extensive simulations show DRIQN surpasses state-of-the-art baselines in success rate, collision avoidance, and efficiency across varying noise conditions, with last-layer gradient substitution providing additional gains over full-network substitution. The work advances risk-sensitive RL for autonomous maritime navigation and lays groundwork for real-world deployment under complex perceptual disturbances.
Abstract
The robustness of Unmanned Surface Vehicles (USV) is crucial when facing unknown and complex marine environments, especially when heteroscedastic observational noise poses significant challenges to sensor-based navigation tasks. Recently, Distributional Reinforcement Learning (DistRL) has shown promising results in some challenging autonomous navigation tasks without prior environmental information. However, these methods overlook situations where noise patterns vary across different environmental conditions, hindering safe navigation and disrupting the learning of value functions. To address the problem, we propose DRIQN to integrate Distributionally Robust Optimization (DRO) with implicit quantile networks to optimize worst-case performance under natural environmental conditions. Leveraging explicit subgroup modeling in the replay buffer, DRIQN incorporates heterogeneous noise sources and target robustness-critical scenarios. Experimental results based on the risk-sensitive environment demonstrate that DRIQN significantly outperforms state-of-the-art methods, achieving +13.51\% success rate, -12.28\% collision rate and +35.46\% for time saving, +27.99\% for energy saving, compared with the runner-up.
