Table of Contents
Fetching ...

EnergyAction: Unimanual to Bimanual Composition with Energy-Based Models

Mingchen Song, Xiang Deng, Jie Wei, Dongmei Jiang, Liqiang Nie, Weili Guan

Abstract

Recent advances in unimanual manipulation policies have achieved remarkable success across diverse robotic tasks through abundant training data and well-established model architectures. However, extending these capabilities to bimanual manipulation remains challenging due to the lack of bimanual demonstration data and the complexity of coordinating dual-arm actions. Existing approaches either rely on extensive bimanual datasets or fail to effectively leverage pre-trained unimanual policies. To address this limitation, we propose \textbf{EnergyAction}, a novel framework that compositionally transfers unimanual manipulation policies to bimanual tasks through the Energy-Based Models (EBMs). Specifically, our method incorporates three key innovations. First, we model individual unimanual policies as EBMs and leverage their compositional properties to compose left and right arm actions, enabling the fusion of unimanual policies into a bimanual policy. Second, we introduce an energy-based temporal-spatial coordination mechanism through energy constraints, ensuring the generated bimanual actions are both temporal coherence and spatial feasibility. Third, we propose two different energy-aware denoising strategies that dynamically adapt denoising steps based on action quality assessment. These strategies ensure the generation of high-quality actions while maintaining superior computational efficiency compared to fixed-step denoising approaches. Experimental results demonstrate that EnergyAction effectively transfers unimanual knowledge to bimanual tasks, achieving superior performance on both simulated and real-world tasks with minimal bimanual data.

EnergyAction: Unimanual to Bimanual Composition with Energy-Based Models

Abstract

Recent advances in unimanual manipulation policies have achieved remarkable success across diverse robotic tasks through abundant training data and well-established model architectures. However, extending these capabilities to bimanual manipulation remains challenging due to the lack of bimanual demonstration data and the complexity of coordinating dual-arm actions. Existing approaches either rely on extensive bimanual datasets or fail to effectively leverage pre-trained unimanual policies. To address this limitation, we propose \textbf{EnergyAction}, a novel framework that compositionally transfers unimanual manipulation policies to bimanual tasks through the Energy-Based Models (EBMs). Specifically, our method incorporates three key innovations. First, we model individual unimanual policies as EBMs and leverage their compositional properties to compose left and right arm actions, enabling the fusion of unimanual policies into a bimanual policy. Second, we introduce an energy-based temporal-spatial coordination mechanism through energy constraints, ensuring the generated bimanual actions are both temporal coherence and spatial feasibility. Third, we propose two different energy-aware denoising strategies that dynamically adapt denoising steps based on action quality assessment. These strategies ensure the generation of high-quality actions while maintaining superior computational efficiency compared to fixed-step denoising approaches. Experimental results demonstrate that EnergyAction effectively transfers unimanual knowledge to bimanual tasks, achieving superior performance on both simulated and real-world tasks with minimal bimanual data.
Paper Structure (23 sections, 22 equations, 5 figures, 4 tables)

This paper contains 23 sections, 22 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Compositional Action Generation. Inspired by the theory of EBMs hinton2002training, we first model unimanual policies as energy functions and then leverage their compositional properties to obtain bimanual policies.
  • Figure 2: Overall of EnergyAction. (a) We model unimanual policies as energy functions and compose them to form a bimanual policy. (b) To ensure coordinated bimanual actions, we introduce energy-based constraints from both temporal and spatial perspectives. (c) We propose two different energy-aware denoising strategies that adaptively adjust denoising steps based on energy values.
  • Figure 3: Distribution of denoising steps for two different energy-aware inference strategies. Both achieve competitive success rates with fast inference speed.
  • Figure 4: Single-arm task scaling in EnergyAction.
  • Figure 5: Visualization of bimanual manipulation in real-world scenarios. Green and blue arrows indicate left and right arm motion trajectories. EnergyAction generates coordinated bimanual actions with temporal coherence and spatial feasibility.