Table of Contents
Fetching ...

Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

Thanpimon Buamanee, Masato Kobayashi, Yuki Uranishi, Haruo Takemura

TL;DR

Bi-ACT tackles robust autonomous manipulation by fusing bilateral control-based imitation learning with Action Chunking and Transformer (ACT). It collects multimodal data—two RGB images and joint states including forces—and predicts leader actions over $k$ steps to drive follower behavior through a bilateral control loop. By chunking actions, it reduces horizon-related errors and improves handling of temporally correlated variations, while incorporating force feedback for object hardness and weight variability. Real-world experiments on pick-and-place and put-in-drawer tasks show strong generalization to unseen objects and the practical benefits of force data for manipulation.

Abstract

Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper proposes work stands at the intersection of two innovative approaches in the field of robotics and machine learning. Inspired by the Action Chunking with Transformer (ACT) model, which employs joint location and image data to predict future movements, our work integrates principles of Bilateral Control-Based Imitation Learning to enhance robotic control. Our objective is to synergize these techniques, thereby creating a more robust and efficient control mechanism. In our approach, the data collected from the environment are images from the gripper and overhead cameras, along with the joint angles, angular velocities, and forces of the follower robot using bilateral control. The model is designed to predict the subsequent steps for the joint angles, angular velocities, and forces of the leader robot. This predictive capability is crucial for implementing effective bilateral control in the follower robot, allowing for more nuanced and responsive maneuvering.

Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

TL;DR

Bi-ACT tackles robust autonomous manipulation by fusing bilateral control-based imitation learning with Action Chunking and Transformer (ACT). It collects multimodal data—two RGB images and joint states including forces—and predicts leader actions over steps to drive follower behavior through a bilateral control loop. By chunking actions, it reduces horizon-related errors and improves handling of temporally correlated variations, while incorporating force feedback for object hardness and weight variability. Real-world experiments on pick-and-place and put-in-drawer tasks show strong generalization to unseen objects and the practical benefits of force data for manipulation.

Abstract

Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper proposes work stands at the intersection of two innovative approaches in the field of robotics and machine learning. Inspired by the Action Chunking with Transformer (ACT) model, which employs joint location and image data to predict future movements, our work integrates principles of Bilateral Control-Based Imitation Learning to enhance robotic control. Our objective is to synergize these techniques, thereby creating a more robust and efficient control mechanism. In our approach, the data collected from the environment are images from the gripper and overhead cameras, along with the joint angles, angular velocities, and forces of the follower robot using bilateral control. The model is designed to predict the subsequent steps for the joint angles, angular velocities, and forces of the leader robot. This predictive capability is crucial for implementing effective bilateral control in the follower robot, allowing for more nuanced and responsive maneuvering.
Paper Structure (18 sections, 2 equations, 11 figures, 3 tables)

This paper contains 18 sections, 2 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of Bilateral Control-Based Imitation Learning via Action Chunking with Transformer (Bi-ACT)
  • Figure 2: Block Diagram of Four-channel Bilateral Control and Four-channel Bilateral Control-Based Imitation Learning
  • Figure 3: Block Diagram of Control System
  • Figure 4: Model Architecture: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer
  • Figure 5: Definition of Robot and Camera View
  • ...and 6 more figures