ILBiT: Imitation Learning for Robot Using Position and Torque Information based on Bilateral Control with Transformer
Masato Kobayashi, Thanpimon Buamanee, Yuki Uranishi, Haruo Takemura
TL;DR
This work tackles autonomous robotic manipulation by combining imitation learning with bilateral control and Transformer-based sequence modeling. By collecting rich, torque-inclusive demonstrations via a four-channel bilateral setup and training a Transformer encoder, ILBiT predicts leader actions for fast, force-aware execution at 100 Hz. Experiments on a two-robot OpenMANIPULATOR-X setup show ILBiT generalizes better to untrained objects and tasks than LSTM-based baselines, with higher success rates across pick, move, and place actions. The approach offers improved adaptability and speed for real-world manipulation, with potential applicability to broader robotic platforms and dynamic environments.
Abstract
Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper introduces an innovative approach to this challenge by focusing on imitation learning (IL). Unlike traditional imitation methods, our approach uses IL based on bilateral control, allowing for more precise and adaptable robot movements. The conventional IL based on bilateral control method have relied on Long Short-Term Memory (LSTM) networks. In this paper, we present the IL for robot using position and torque information based on Bilateral control with Transformer (ILBiT). This proposed method employs the Transformer model, known for its robust performance in handling diverse datasets and its capability to surpass LSTM's limitations, especially in tasks requiring detailed force adjustments. A standout feature of ILBiT is its high-frequency operation at 100 Hz, which significantly improves the system's adaptability and response to varying environments and objects of different hardness levels. The effectiveness of the Transformer-based ILBiT method can be seen through comprehensive real-world experiments.
