Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory for Long-Horizon Contact-Rich Robotic Manipulation

Thanpimon Buamanee; Masato Kobayashi; Yuki Uranishi

Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory for Long-Horizon Contact-Rich Robotic Manipulation

Thanpimon Buamanee, Masato Kobayashi, Yuki Uranishi

Abstract

Long-horizon contact-rich robotic manipulation remains challenging due to partial observability and unstable subtask transitions under contact uncertainty. While hierarchical architectures improve temporal reasoning and bilateral imitation learning enables force-aware control, existing approaches often rely on flat policies that struggle with long-horizon coordination. We propose Bi-HIL, a bilateral control-based multimodal hierarchical imitation learning framework for long-horizon manipulation. Bi-HIL stabilizes hierarchical coordination by integrating keyframe memory with subtask-level progress rate that models phase progression within the active subtask and conditions both high- and low-level policies. We evaluate Bi-HIL on unimanual and bimanual real-robot tasks, demonstrating consistent improvements over flat and ablated variants. The results highlight the importance of explicitly modeling subtask progression together with force-aware control for robust long-horizon manipulation. For additional material, please check: https://mertcookimg.github.io/bi-hil

Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory for Long-Horizon Contact-Rich Robotic Manipulation

Abstract

Paper Structure (28 sections, 5 equations, 11 figures, 5 tables)

This paper contains 28 sections, 5 equations, 11 figures, 5 tables.

Introduction
Related Work
Framework for Long-Horizon Robotic Manipulation
Bilateral Control-based Imitation Learning (Bi-IL)
Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress and Keyframe Memory
Overview
Data Collection
High-Level Policy
Inputs.
Outputs.
Keyframe memory update.
Training objective.
Low-Level Policy
UnimanualExperiments
Hardware
...and 13 more sections

Figures (11)

Figure 1: Concept of Bi-HIL
Figure 2: Overview of Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory
Figure 3: High-Policy Architecture
Figure 4: Definition of Representative Keyframes and Subtask-Level Progress Rate
Figure 5: Low-Policy Architecture
...and 6 more figures

Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory for Long-Horizon Contact-Rich Robotic Manipulation

Abstract

Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory for Long-Horizon Contact-Rich Robotic Manipulation

Authors

Abstract

Table of Contents

Figures (11)