Table of Contents
Fetching ...

Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers

Takeru Tsunoori, Masato Kobayashi, Yuki Uranishi

TL;DR

Bi-AQUA tackles the challenge of underwater visuomotor manipulation under dynamic lighting by introducing a bilateral imitation learning framework that explicitly models lighting at multiple levels. It combines a label-free Lighting Encoder, FiLM-based feature modulation, and a dedicated lighting token within a transformer-based policy to enable robust, force-aware control in visually degraded conditions. Real-world experiments show strong lighting robustness and generalization to unseen objects and disturbances, with ablations confirming the complementary value of each lighting-aware component. The approach advances practical autonomous underwater manipulation by integrating perception and control adaptation to lighting, potentially enabling more reliable subsea operations without extensive relighting or domain randomization. Limitations include evaluation on a single task and environment, suggesting future work on richer skills, field deployments, and multi-modal sensing integration.

Abstract

Underwater robotic manipulation is fundamentally challenged by extreme lighting variations, color distortion, and reduced visibility. We introduce Bi-AQUA, the first underwater bilateral control-based imitation learning framework that integrates lighting-aware visual processing for underwater robot arms. Bi-AQUA employs a hierarchical three-level lighting adaptation mechanism: a Lighting Encoder that extracts lighting representations from RGB images without manual annotation and is implicitly supervised by the imitation objective, FiLM modulation of visual backbone features for adaptive, lighting-aware feature extraction, and an explicit lighting token added to the transformer encoder input for task-aware conditioning. Experiments on a real-world underwater pick-and-place task under diverse static and dynamic lighting conditions show that Bi-AQUA achieves robust performance and substantially outperforms a bilateral baseline without lighting modeling. Ablation studies further confirm that all three lighting-aware components are critical. This work bridges terrestrial bilateral control-based imitation learning and underwater manipulation, enabling force-sensitive autonomous operation in challenging marine environments. For additional material, please check: https://mertcookimg.github.io/bi-aqua

Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers

TL;DR

Bi-AQUA tackles the challenge of underwater visuomotor manipulation under dynamic lighting by introducing a bilateral imitation learning framework that explicitly models lighting at multiple levels. It combines a label-free Lighting Encoder, FiLM-based feature modulation, and a dedicated lighting token within a transformer-based policy to enable robust, force-aware control in visually degraded conditions. Real-world experiments show strong lighting robustness and generalization to unseen objects and disturbances, with ablations confirming the complementary value of each lighting-aware component. The approach advances practical autonomous underwater manipulation by integrating perception and control adaptation to lighting, potentially enabling more reliable subsea operations without extensive relighting or domain randomization. Limitations include evaluation on a single task and environment, suggesting future work on richer skills, field deployments, and multi-modal sensing integration.

Abstract

Underwater robotic manipulation is fundamentally challenged by extreme lighting variations, color distortion, and reduced visibility. We introduce Bi-AQUA, the first underwater bilateral control-based imitation learning framework that integrates lighting-aware visual processing for underwater robot arms. Bi-AQUA employs a hierarchical three-level lighting adaptation mechanism: a Lighting Encoder that extracts lighting representations from RGB images without manual annotation and is implicitly supervised by the imitation objective, FiLM modulation of visual backbone features for adaptive, lighting-aware feature extraction, and an explicit lighting token added to the transformer encoder input for task-aware conditioning. Experiments on a real-world underwater pick-and-place task under diverse static and dynamic lighting conditions show that Bi-AQUA achieves robust performance and substantially outperforms a bilateral baseline without lighting modeling. Ablation studies further confirm that all three lighting-aware components are critical. This work bridges terrestrial bilateral control-based imitation learning and underwater manipulation, enabling force-sensitive autonomous operation in challenging marine environments. For additional material, please check: https://mertcookimg.github.io/bi-aqua

Paper Structure

This paper contains 26 sections, 7 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Concept of Bi-AQUA.
  • Figure 2: Unilateral Control-based Imitation Learning
  • Figure 3: Bilateral Control-based Imitation Learning
  • Figure 4: Data collection of Bi-AQUA.
  • Figure 5: Overview of Bi-AQUA. Given multi-view underwater observations and follower joint states, Bi-AQUA extracts lighting-aware visual features, fuses them with proprioception, and predicts leader-side action chunks within a bilateral control loop.
  • ...and 8 more figures