Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

Thanpimon Buamanee; Masato Kobayashi; Yuki Uranishi; Haruo Takemura

Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

Thanpimon Buamanee, Masato Kobayashi, Yuki Uranishi, Haruo Takemura

TL;DR

Bi-ACT tackles robust autonomous manipulation by fusing bilateral control-based imitation learning with Action Chunking and Transformer (ACT). It collects multimodal data—two RGB images and joint states including forces—and predicts leader actions over $k$ steps to drive follower behavior through a bilateral control loop. By chunking actions, it reduces horizon-related errors and improves handling of temporally correlated variations, while incorporating force feedback for object hardness and weight variability. Real-world experiments on pick-and-place and put-in-drawer tasks show strong generalization to unseen objects and the practical benefits of force data for manipulation.

Abstract

Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper proposes work stands at the intersection of two innovative approaches in the field of robotics and machine learning. Inspired by the Action Chunking with Transformer (ACT) model, which employs joint location and image data to predict future movements, our work integrates principles of Bilateral Control-Based Imitation Learning to enhance robotic control. Our objective is to synergize these techniques, thereby creating a more robust and efficient control mechanism. In our approach, the data collected from the environment are images from the gripper and overhead cameras, along with the joint angles, angular velocities, and forces of the follower robot using bilateral control. The model is designed to predict the subsequent steps for the joint angles, angular velocities, and forces of the leader robot. This predictive capability is crucial for implementing effective bilateral control in the follower robot, allowing for more nuanced and responsive maneuvering.

Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

TL;DR

steps to drive follower behavior through a bilateral control loop. By chunking actions, it reduces horizon-related errors and improves handling of temporally correlated variations, while incorporating force feedback for object hardness and weight variability. Real-world experiments on pick-and-place and put-in-drawer tasks show strong generalization to unseen objects and the practical benefits of force data for manipulation.

Abstract

Paper Structure (18 sections, 2 equations, 11 figures, 3 tables)

This paper contains 18 sections, 2 equations, 11 figures, 3 tables.

Introduction
Related Works
Bilateral Control-Based Imitation Learning
Action Chunking with Transformer
Control System
Controller
Bilateral Control
Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer
Overview
Data Collection
Learning Architecture
Execution to Robot Arm
Experiments
Hardware
Environment Setting
...and 3 more sections

Figures (11)

Figure 1: Overview of Bilateral Control-Based Imitation Learning via Action Chunking with Transformer (Bi-ACT)
Figure 2: Block Diagram of Four-channel Bilateral Control and Four-channel Bilateral Control-Based Imitation Learning
Figure 3: Block Diagram of Control System
Figure 4: Model Architecture: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer
Figure 5: Definition of Robot and Camera View
...and 6 more figures

Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

TL;DR

Abstract

Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (11)