OceanGym: A Benchmark Environment for Underwater Embodied Agents
Yida Xue, Mingjun Mao, Xiangyuan Ru, Yuqi Zhu, Baochang Ren, Shuofei Qiao, Mengru Wang, Shumin Deng, Xinyu An, Ningyu Zhang, Ying Chen, Huajun Chen
TL;DR
OceanGym introduces a high-fidelity underwater embodied AI benchmark built on Unreal Engine 5.3, spanning $800\ \mathrm{m}\times800\ \mathrm{m}$ with adjustable depth to simulate lighting variations and eight task domains that cover perception and decision-making. It deploys a unified memory-augmented MLLM-based agent framework that fuses RGB and sonar data to control Autonomous Underwater Vehicles (AUVs) in partially observable, continuous 3D environments. Experimental results show current state-of-the-art MLLMs lag behind human experts, especially under low visibility and when interpreting sonar data, though memory and cross-task transfer offer performance improvements; a plateau in scaling indicates deeper changes in perception, memory, and planning are needed. The work positions OceanGym as a practical sim-to-real bridge for developing robust autonomous underwater systems, enabling synthetic data generation and reinforcement-learning feedback to expedite real-world deployment of underwater agents.
Abstract
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
