Table of Contents
Fetching ...

A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics

Markus Buchholz, Ignacio Carlucho, Yvan R. Petillot

TL;DR

AURA tackles safe autonomy in underwater robotics by integrating a high-fidelity digital twin with two local LLM agents in a human-in-the-loop loop for anomaly diagnosis. The perception agent translates raw telemetry into structured problem descriptions, while the reasoning agent grounds hypotheses in external knowledge and operator input, with a Retrieval-Augmented memory (VDB) that distills expert diagnoses into reusable cases. Architectural safeguards—prompt-level guardrails and human validation—ensure verifiability, while Stage 4 enables proactive pre-mission knowledge injection. Experimental validation in a controlled BlueROV2 setup shows significant improvements in diagnostic specificity (CSS) and a reduction in dialog effort, demonstrating a scalable pathway toward continually improving resilient autonomy.

Abstract

The safe deployment of autonomous systems in safety-critical settings requires a paradigm that combines human expertise with AI-driven analysis, especially when anomalies are unforeseen. We introduce AURA (Autonomous Resilience Agent), a collaborative framework for anomaly and fault diagnostics in robotics. AURA integrates large language models (LLMs), a high-fidelity digital twin (DT), and human-in-the-loop interaction to detect and respond to anomalous behavior in real time. The architecture uses two agents with clear roles: (i) a low-level State Anomaly Characterization Agent that monitors telemetry and converts signals into a structured natural-language problem description, and (ii) a high-level Diagnostic Reasoning Agent that conducts a knowledge-grounded dialogue with an operator to identify root causes, drawing on external sources. Human-validated diagnoses are then converted into new training examples that refine the low-level perceptual model. This feedback loop progressively distills expert knowledge into the AI, transforming it from a static tool into an adaptive partner. We describe the framework's operating principles and provide a concrete implementation, establishing a pattern for trustworthy, continually improving human-robot teams.

A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics

TL;DR

AURA tackles safe autonomy in underwater robotics by integrating a high-fidelity digital twin with two local LLM agents in a human-in-the-loop loop for anomaly diagnosis. The perception agent translates raw telemetry into structured problem descriptions, while the reasoning agent grounds hypotheses in external knowledge and operator input, with a Retrieval-Augmented memory (VDB) that distills expert diagnoses into reusable cases. Architectural safeguards—prompt-level guardrails and human validation—ensure verifiability, while Stage 4 enables proactive pre-mission knowledge injection. Experimental validation in a controlled BlueROV2 setup shows significant improvements in diagnostic specificity (CSS) and a reduction in dialog effort, demonstrating a scalable pathway toward continually improving resilient autonomy.

Abstract

The safe deployment of autonomous systems in safety-critical settings requires a paradigm that combines human expertise with AI-driven analysis, especially when anomalies are unforeseen. We introduce AURA (Autonomous Resilience Agent), a collaborative framework for anomaly and fault diagnostics in robotics. AURA integrates large language models (LLMs), a high-fidelity digital twin (DT), and human-in-the-loop interaction to detect and respond to anomalous behavior in real time. The architecture uses two agents with clear roles: (i) a low-level State Anomaly Characterization Agent that monitors telemetry and converts signals into a structured natural-language problem description, and (ii) a high-level Diagnostic Reasoning Agent that conducts a knowledge-grounded dialogue with an operator to identify root causes, drawing on external sources. Human-validated diagnoses are then converted into new training examples that refine the low-level perceptual model. This feedback loop progressively distills expert knowledge into the AI, transforming it from a static tool into an adaptive partner. We describe the framework's operating principles and provide a concrete implementation, establishing a pattern for trustworthy, continually improving human-robot teams.

Paper Structure

This paper contains 21 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: High-level concept of AURA. We use a DT (top view) as a real-time baseline for the physical AUV (bottom view). When a difference occurs between the two, an anomaly detection is triggered, which initiates a collaborative dialogue to diagnose the problem. These experiences are used in the future to enhance the AI agents.
  • Figure 2: The AURA Collaborative Reasoning Architecture. Anomaly detection triggers the State Anomaly Characterisation Agent (A), which translates raw data into a structured Problem Characterisation. This is passed to the Diagnostic Reasoning Agent (B), which uses external knowledge and an interactive dialogue with the operator to find a solution. The outcome is stored in the VDB to refine Agent A.
  • Figure 3: An illustrative example of the Human-in-the-Loop Distillation process, designed to produce a quantifiable improvement in diagnostic performance. The figure contrasts the workflow in two phases: First Encounter (Left) and Post-Distillation (Right). In the First Encounter, with No context from the VDB, Agent A produces a generic characterization. This necessitates an extended, multi-turn diagnostic dialog between Agent B and the operator to identify the root cause. The outcome of this session is processed into a distilled lesson, which is then passed to an Embedding Model. The resulting embedding vector, a numerical representation of the experience, is stored in the VDB. In the Post-Distillation phase, when a similar anomaly occurs, Agent A receives retrieved context. This allows it to generate a highly specific and insightful characterization, leading to a much shorter, confirmatory dialog and demonstrating the system's enhanced diagnostic efficiency.
  • Figure 4: The experimental validation platform for the AURA framework. The system simultaneously processes state telemetry from two sources: a physical BlueROV2 AUV operating in a water tank (the real system) and a high-fidelity digital twin running in the Stonefish simulator (the normative model). Both data streams are fed into the AURA framework, which is managed on a local host station. This dual-reality setup allows for the precise generation and detection of anomalies, providing a controlled environment to evaluate the performance of the human-AI collaborative diagnostic loop.
  • Figure 5: Qualitative comparison of AURA's performance, contrasting its response to a Training Anomaly (Top) with its response to a subsequent Validation Anomaly (Bottom). (Top) In the First Encounter, the system is inefficient and its initial output is a lengthy Descriptive Hypothesis (CSS=2), which details symptoms but fails to identify the root cause. (Bottom) After the lesson from the first event is distilled, in the Post-Distillation phase, the system is confident and efficient. For a similar validation anomaly, it produces a concise and accurate Causal Identification (CSS=5), correctly pinpointing the tether as the root cause. This figure provides a concrete example of the qualitative leap in performance that supports the quantitative results presented in Table \ref{['tab:results']}.