When Motion Learns to Listen: Diffusion-Prior Lyapunov Actor-Critic Framework with LLM Guidance for Stable and Robust AUV Control in Underwater Tasks
Jingzehua Xu, Weiyi Liu, Weihang Zhang, Zhuofan Xi, Guanwen Xie, Shuai Zhang, Yi Li
TL;DR
The paper tackles stable, robust AUV control under nonlinear hydrodynamics, disturbances, and localization uncertainty. It proposes a diffusion-prior Lyapunov actor–critic framework that unifies exploration, stability guarantees, and task-driven adaptability via an LLM-guided outer loop. The approach integrates a diffusion model for long-horizon action proposals, a Lyapunov critic enforcing stability with dual constraints, and an LLM that semantically refines Lyapunov functions to suit mission objectives, forming a generation–filtering–optimization pipeline. In high-fidelity 6-DoF underwater simulations, the framework demonstrates superior trajectory tracking, higher task completion, improved energy efficiency, faster convergence, and enhanced robustness compared with traditional RL and diffusion-augmented baselines, indicating strong potential for real-world AUV autonomy.
Abstract
Autonomous Underwater Vehicles (AUVs) are indispensable for marine exploration; yet, their control is hindered by nonlinear hydrodynamics, time-varying disturbances, and localization uncertainty. Traditional controllers provide only limited adaptability, while Reinforcement Learning (RL), though promising, suffers from sample inefficiency, weak long-term planning, and lacks stability guarantees, leading to unreliable behavior. To address these challenges, we propose a diffusion-prior Lyapunov actor-critic framework that unifies exploration, stability, and semantic adaptability. Specifically, a diffusion model generates smooth, multimodal, and disturbance-resilient candidate actions; a Lyapunov critic further imposes dual constraints that ensure stability; and a Large Language Model (LLM)-driven outer loop adaptively selects and refines Lyapunov functions based on task semantics and training feedback. This "generation-filtering-optimization" mechanism not only enhances sample efficiency and planning capability but also aligns stability guarantees with diverse mission requirements in the multi-objective optimization task. Extensive simulations under complex ocean dynamics demonstrate that the proposed framework achieves more accurate trajectory tracking, higher task completion rates, improved energy efficiency, faster convergence, and improved robustness compared with conventional RL and diffusion-augmented baselines.
