Table of Contents
Fetching ...

Flow Matching Imitation Learning for Multi-Support Manipulation

Quentin Rouxel, Andrea Ferrari, Serena Ivaldi, Jean-Baptiste Mouret

TL;DR

This paper proposes a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions for imitation learning and introduces a shared autonomy mode for assisted teleoperation.

Abstract

Humanoid robots could benefit from using their upper bodies for support contacts, enhancing their workspace, stability, and ability to perform contact-rich and pushing tasks. In this paper, we propose a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions for imitation learning. In simulation, we show that Flow Matching is more appropriate for robotics than Diffusion and traditional behavior cloning. On a real full-size humanoid robot (Talos), we demonstrate that our approach can learn a whole-body non-prehensile box-pushing task and that the robot can close dishwasher drawers by adding contacts with its free hand when needed for balance. We also introduce a shared autonomy mode for assisted teleoperation, providing automatic contact placement for tasks not covered in the demonstrations. Full experimental videos are available at: https://hucebot.github.io/flow_multisupport_website/

Flow Matching Imitation Learning for Multi-Support Manipulation

TL;DR

This paper proposes a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions for imitation learning and introduces a shared autonomy mode for assisted teleoperation.

Abstract

Humanoid robots could benefit from using their upper bodies for support contacts, enhancing their workspace, stability, and ability to perform contact-rich and pushing tasks. In this paper, we propose a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions for imitation learning. In simulation, we show that Flow Matching is more appropriate for robotics than Diffusion and traditional behavior cloning. On a real full-size humanoid robot (Talos), we demonstrate that our approach can learn a whole-body non-prehensile box-pushing task and that the robot can close dishwasher drawers by adding contacts with its free hand when needed for balance. We also introduce a shared autonomy mode for assisted teleoperation, providing automatic contact placement for tasks not covered in the demonstrations. Full experimental videos are available at: https://hucebot.github.io/flow_multisupport_website/
Paper Structure (16 sections, 5 equations, 10 figures, 1 table)

This paper contains 16 sections, 5 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: To perform multi-support tasks, the Talos humanoid robot uses its right hand as an additional support to extend its reach and maintain balance. Imitation learning allows the robot to autonomously solve these tasks or assist a human operator with automatic contact placement (see videos).
  • Figure 2: The system architecture uses three different operational modes (right) as high-level controllers outputting effector commands. These commands are realized on the robot (left) by SEIKO Retargeting seikoseiko2 and SEIKO Controller rouxel2024multicontact.
  • Figure 3: Input and output signals used to command contact switches. The policy outputs the continuous command signal $\gamma^{\text{eff}}$ converted to a discrete contact switch command $c^{\text{eff}}$ using a hysteresis threshold. The time information $\tau^{\text{eff}}$ disambiguates the states and produces the waiting behavior required when removing or adding a contact.
  • Figure 4: Inference of the policy through integration of the learned flow in 4 steps.
  • Figure 5: The policy outputs new trajectories (yellow, orange, red) of fixed length at regular intervals. Due to the time required for inference, each new trajectory is seamlessly stitched online with the previous one to prevent discontinuities using linear interpolation. The dashed green trajectory depicts the resulting commands sent to the low-level retargeting.
  • ...and 5 more figures