Table of Contents
Fetching ...

DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring

Zhenqi Wu, Abhinav Modi, Angelos Mavrogiannis, Kaustubh Joshi, Nikhil Chopra, Yiannis Aloimonos, Nare Karapetyan, Ioannis Rekleitis, Xiaomin Lin

TL;DR

DREAM addresses persistent underwater monitoring by combining domain knowledge with a Vision Language Model in a three‑layer architecture: perception, cognitive‑aware planning with Chain‑of‑Thought, and control. It builds a persistent occupancy map from multimodal sensing and uses a VLM‑guided planner to generate high‑level actions that efficiently cover target objects like oysters and shipwrecks while enforcing safety. Across simulations and a real‑world tank test, DREAM outperforms a state‑of‑the‑art baseline and a vanilla VLM approach in time efficiency and coverage, demonstrating practical feasibility for long‑term marine monitoring. This work advances autonomous underwater surveillance by enabling environment‑aware decision making, persistent memory, and robust low‑level control, with open‑source releases planned for datasets, environments, and code.

Abstract

The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. To enable underwater robots to make real-time, environment-aware decisions without human intervention, we must equip them with an intelligent "brain." This highlights the need for persistent,wide-area, and low-cost benthic monitoring. To this end, we present DREAM, a Vision Language Model (VLM)-guided autonomy framework for long-term underwater exploration and habitat monitoring. The results show that our framework is highly efficient in finding and exploring target objects (e.g., oysters, shipwrecks) without prior location information. In the oyster-monitoring task, our framework takes 31.5% less time than the previous baseline with the same amount of oysters. Compared to the vanilla VLM, it uses 23% fewer steps while covering 8.88% more oysters. In shipwreck scenes, our framework successfully explores and maps the wreck without collisions, requiring 27.5% fewer steps than the vanilla model and achieving 100% coverage, while the vanilla model achieves 60.23% average coverage in our shipwreck environments.

DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring

TL;DR

DREAM addresses persistent underwater monitoring by combining domain knowledge with a Vision Language Model in a three‑layer architecture: perception, cognitive‑aware planning with Chain‑of‑Thought, and control. It builds a persistent occupancy map from multimodal sensing and uses a VLM‑guided planner to generate high‑level actions that efficiently cover target objects like oysters and shipwrecks while enforcing safety. Across simulations and a real‑world tank test, DREAM outperforms a state‑of‑the‑art baseline and a vanilla VLM approach in time efficiency and coverage, demonstrating practical feasibility for long‑term marine monitoring. This work advances autonomous underwater surveillance by enabling environment‑aware decision making, persistent memory, and robust low‑level control, with open‑source releases planned for datasets, environments, and code.

Abstract

The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. To enable underwater robots to make real-time, environment-aware decisions without human intervention, we must equip them with an intelligent "brain." This highlights the need for persistent,wide-area, and low-cost benthic monitoring. To this end, we present DREAM, a Vision Language Model (VLM)-guided autonomy framework for long-term underwater exploration and habitat monitoring. The results show that our framework is highly efficient in finding and exploring target objects (e.g., oysters, shipwrecks) without prior location information. In the oyster-monitoring task, our framework takes 31.5% less time than the previous baseline with the same amount of oysters. Compared to the vanilla VLM, it uses 23% fewer steps while covering 8.88% more oysters. In shipwreck scenes, our framework successfully explores and maps the wreck without collisions, requiring 27.5% fewer steps than the vanilla model and achieving 100% coverage, while the vanilla model achieves 60.23% average coverage in our shipwreck environments.

Paper Structure

This paper contains 15 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Example of DREAM deployed in the real world on a BlueROV surveying an oyster reef in a pool. The top left image shows a sample observation from the robot's camera.
  • Figure 2: An overview of the DREAM framework. The environment provides multimodal inputs (RGB, depth, and segmentation) captured by the ROV. These observations are fused in the Perception module to build and update occupancy maps. Cognitive-Aware Planning leverages a vision–language model with chain-of-thought reasoning to guide frontier selection, mission planning, and persistent monitoring of underwater objects of interest. Robotic control then executes movement commands, closing the loop for continual exploration and monitoring.
  • Figure 3: FOV demonstration on the occupancy map
  • Figure 4: Testing environments and algorithmic comparisons. Top row: simulated oyster reef (left) and shipwreck (right) environments. Bottom row: comparison of our framework against baseline methods, showing improved efficiency and coverage in both oyster and shipwreck monitoring tasks.
  • Figure 5: Pool setup with oyster shells arranged in a circular arc and two pipes positioned to emulate a shipwreck.
  • ...and 1 more figures