Table of Contents
Fetching ...

Adapt as You Say: Online Interactive Bimanual Skill Adaptation via Human Language Feedback

Zhuo Li, Dianxi Li, Tao Teng, Quentin Rouxel, Zhipeng Dong, Dennis Hong, Darwin Caldwell, Fei Chen

Abstract

Developing general-purpose robots capable of autonomously operating in human living environments requires the ability to adapt to continuously evolving task conditions. However, adapting high-dimensional coordinated bimanual skills to novel task variations at deployment remains a fundamental challenge. In this work, we present BiSAIL (Bimanual Skill Adaptation via Interactive Language), a novel framework that enables zero-shot online adaptation of offline-learned bimanual skills through interactive language feedback. The key idea of BiSAIL is to adopt a hierarchical reason-then-modulate paradigm, which first infers generalized adaptation objectives from multimodal task variations, and then adapts bimanual motions via diffusion modulation to achieve the inferred objectives. Extensive real-robot experiments across six bimanual tasks and two dual-arm platforms demonstrate that BiSAIL significantly outperforms existing methods in human-in-the-loop adaptability, task generalization and cross-embodiment scalability. This work enables the development of adaptive bimanual assistants that can be flexibly customized by non-expert users via intuitive verbal corrections. Experimental videos and code are available at https://rip4kobe.github.io/BiSAIL/.

Adapt as You Say: Online Interactive Bimanual Skill Adaptation via Human Language Feedback

Abstract

Developing general-purpose robots capable of autonomously operating in human living environments requires the ability to adapt to continuously evolving task conditions. However, adapting high-dimensional coordinated bimanual skills to novel task variations at deployment remains a fundamental challenge. In this work, we present BiSAIL (Bimanual Skill Adaptation via Interactive Language), a novel framework that enables zero-shot online adaptation of offline-learned bimanual skills through interactive language feedback. The key idea of BiSAIL is to adopt a hierarchical reason-then-modulate paradigm, which first infers generalized adaptation objectives from multimodal task variations, and then adapts bimanual motions via diffusion modulation to achieve the inferred objectives. Extensive real-robot experiments across six bimanual tasks and two dual-arm platforms demonstrate that BiSAIL significantly outperforms existing methods in human-in-the-loop adaptability, task generalization and cross-embodiment scalability. This work enables the development of adaptive bimanual assistants that can be flexibly customized by non-expert users via intuitive verbal corrections. Experimental videos and code are available at https://rip4kobe.github.io/BiSAIL/.

Paper Structure

This paper contains 18 sections, 28 equations, 15 figures, 5 tables, 1 algorithm.

Figures (15)

  • Figure 1: Illustration of Online Interactive Bimanual Skill Adaptation. (a) Offline-learned bimanual skills often encounter diverse task variations when deployed in human-centric environments. (b) BiSAIL enables online interactive adaptation of learned bimanual skills through human language feedback, facilitating zero-shot generalization to unseen task variations.
  • Figure 2: An Overview of BiSAIL. (1) High-level Adaptation Objective Reasoning: ESA-CoT infers a structured bimanual adaptation objective from multimodal task variations; (2) Mid-level Bimanual Motion Modulation: Initial motion proposals sampled from BMP are first iteratively optimized toward the adaptation objective, and then modulated through compositional sampling to enforce dual-arm coordination and task compatibility; (3) Low-level Skill Adaptation Refinement: A closed-loop reflection mechanism evaluates post-adaptation outcomes to refine both the adaptation objective and the resulting bimanual motion.
  • Figure 3: Problem Formulation. We formulate the online interactive bimanual skill adaptation as a constrained probabilistic optimization problem.
  • Figure 4: Bimanual Skill Adaptation Objective Reasoning. ESA-CoT enables the objective reasoning through a structured chain-of-thought, augmented with robot embodied information and bimanual domain knowledge.
  • Figure 5: Bimanual Motion Prior Learning. BMP is implemented as an unconditional diffusion model with Transformer encoder-only architecture.
  • ...and 10 more figures