Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers
Takeru Tsunoori, Masato Kobayashi, Yuki Uranishi
TL;DR
Bi-AQUA tackles the challenge of underwater visuomotor manipulation under dynamic lighting by introducing a bilateral imitation learning framework that explicitly models lighting at multiple levels. It combines a label-free Lighting Encoder, FiLM-based feature modulation, and a dedicated lighting token within a transformer-based policy to enable robust, force-aware control in visually degraded conditions. Real-world experiments show strong lighting robustness and generalization to unseen objects and disturbances, with ablations confirming the complementary value of each lighting-aware component. The approach advances practical autonomous underwater manipulation by integrating perception and control adaptation to lighting, potentially enabling more reliable subsea operations without extensive relighting or domain randomization. Limitations include evaluation on a single task and environment, suggesting future work on richer skills, field deployments, and multi-modal sensing integration.
Abstract
Underwater robotic manipulation is fundamentally challenged by extreme lighting variations, color distortion, and reduced visibility. We introduce Bi-AQUA, the first underwater bilateral control-based imitation learning framework that integrates lighting-aware visual processing for underwater robot arms. Bi-AQUA employs a hierarchical three-level lighting adaptation mechanism: a Lighting Encoder that extracts lighting representations from RGB images without manual annotation and is implicitly supervised by the imitation objective, FiLM modulation of visual backbone features for adaptive, lighting-aware feature extraction, and an explicit lighting token added to the transformer encoder input for task-aware conditioning. Experiments on a real-world underwater pick-and-place task under diverse static and dynamic lighting conditions show that Bi-AQUA achieves robust performance and substantially outperforms a bilateral baseline without lighting modeling. Ablation studies further confirm that all three lighting-aware components are critical. This work bridges terrestrial bilateral control-based imitation learning and underwater manipulation, enabling force-sensitive autonomous operation in challenging marine environments. For additional material, please check: https://mertcookimg.github.io/bi-aqua
