Table of Contents
Fetching ...

When Motion Learns to Listen: Diffusion-Prior Lyapunov Actor-Critic Framework with LLM Guidance for Stable and Robust AUV Control in Underwater Tasks

Jingzehua Xu, Weiyi Liu, Weihang Zhang, Zhuofan Xi, Guanwen Xie, Shuai Zhang, Yi Li

TL;DR

The paper tackles stable, robust AUV control under nonlinear hydrodynamics, disturbances, and localization uncertainty. It proposes a diffusion-prior Lyapunov actor–critic framework that unifies exploration, stability guarantees, and task-driven adaptability via an LLM-guided outer loop. The approach integrates a diffusion model for long-horizon action proposals, a Lyapunov critic enforcing stability with dual constraints, and an LLM that semantically refines Lyapunov functions to suit mission objectives, forming a generation–filtering–optimization pipeline. In high-fidelity 6-DoF underwater simulations, the framework demonstrates superior trajectory tracking, higher task completion, improved energy efficiency, faster convergence, and enhanced robustness compared with traditional RL and diffusion-augmented baselines, indicating strong potential for real-world AUV autonomy.

Abstract

Autonomous Underwater Vehicles (AUVs) are indispensable for marine exploration; yet, their control is hindered by nonlinear hydrodynamics, time-varying disturbances, and localization uncertainty. Traditional controllers provide only limited adaptability, while Reinforcement Learning (RL), though promising, suffers from sample inefficiency, weak long-term planning, and lacks stability guarantees, leading to unreliable behavior. To address these challenges, we propose a diffusion-prior Lyapunov actor-critic framework that unifies exploration, stability, and semantic adaptability. Specifically, a diffusion model generates smooth, multimodal, and disturbance-resilient candidate actions; a Lyapunov critic further imposes dual constraints that ensure stability; and a Large Language Model (LLM)-driven outer loop adaptively selects and refines Lyapunov functions based on task semantics and training feedback. This "generation-filtering-optimization" mechanism not only enhances sample efficiency and planning capability but also aligns stability guarantees with diverse mission requirements in the multi-objective optimization task. Extensive simulations under complex ocean dynamics demonstrate that the proposed framework achieves more accurate trajectory tracking, higher task completion rates, improved energy efficiency, faster convergence, and improved robustness compared with conventional RL and diffusion-augmented baselines.

When Motion Learns to Listen: Diffusion-Prior Lyapunov Actor-Critic Framework with LLM Guidance for Stable and Robust AUV Control in Underwater Tasks

TL;DR

The paper tackles stable, robust AUV control under nonlinear hydrodynamics, disturbances, and localization uncertainty. It proposes a diffusion-prior Lyapunov actor–critic framework that unifies exploration, stability guarantees, and task-driven adaptability via an LLM-guided outer loop. The approach integrates a diffusion model for long-horizon action proposals, a Lyapunov critic enforcing stability with dual constraints, and an LLM that semantically refines Lyapunov functions to suit mission objectives, forming a generation–filtering–optimization pipeline. In high-fidelity 6-DoF underwater simulations, the framework demonstrates superior trajectory tracking, higher task completion, improved energy efficiency, faster convergence, and enhanced robustness compared with traditional RL and diffusion-augmented baselines, indicating strong potential for real-world AUV autonomy.

Abstract

Autonomous Underwater Vehicles (AUVs) are indispensable for marine exploration; yet, their control is hindered by nonlinear hydrodynamics, time-varying disturbances, and localization uncertainty. Traditional controllers provide only limited adaptability, while Reinforcement Learning (RL), though promising, suffers from sample inefficiency, weak long-term planning, and lacks stability guarantees, leading to unreliable behavior. To address these challenges, we propose a diffusion-prior Lyapunov actor-critic framework that unifies exploration, stability, and semantic adaptability. Specifically, a diffusion model generates smooth, multimodal, and disturbance-resilient candidate actions; a Lyapunov critic further imposes dual constraints that ensure stability; and a Large Language Model (LLM)-driven outer loop adaptively selects and refines Lyapunov functions based on task semantics and training feedback. This "generation-filtering-optimization" mechanism not only enhances sample efficiency and planning capability but also aligns stability guarantees with diverse mission requirements in the multi-objective optimization task. Extensive simulations under complex ocean dynamics demonstrate that the proposed framework achieves more accurate trajectory tracking, higher task completion rates, improved energy efficiency, faster convergence, and improved robustness compared with conventional RL and diffusion-augmented baselines.

Paper Structure

This paper contains 18 sections, 29 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the diffusion model principle. Through the forward process $q(\mathbf{a}_t \mid \mathbf{a}_{t-1})$ and the reverse process $p_\theta(\mathbf{a}_{t-1} \mid \mathbf{a}_t, \mathbf{s}_t)$, the model progressively refines noisy actions into a coherent control sequence, ultimately generating a smooth AUV trajectory from $\mathbf{s}_0$ to the current state $\mathbf{s}_t$.
  • Figure 2: Architecture of the proposed framework for AUV robust control. This diffusion-prior Lyapunov actor-critic framework consists of three components: (A) Diffusion model for feasible action proposals; (B) Hybrid diffusion-RL policy, and (C) Lyapunov actor-critic with LLM-guided stability optimization.
  • Figure 3: Visualizations of the scenario adopted in this study, where the ASV serves as a mobile communication relay and positioning anchor for the underwater AUVs, and multiple AUVs equipped with the proposed framework collaboratively navigate the environment to perform the data collection task.
  • Figure 4: Training loss curve of the diffusion model, showing rapid initial decay and smooth long-term convergence, with an inset highlighting the final stabilization phase around $3\times10^{-2}$.
  • Figure 5: Visualization of five candidate actions across three denoising stages under the proposed framework. Trajectories evolve from scattered exploration to smooth, task-aligned paths, demonstrating the ability of diffusion model to generate diverse yet optimized plans.
  • ...and 6 more figures