Table of Contents
Fetching ...

ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation

Wenhai Liu, Junbo Wang, Yiming Wang, Weiming Wang, Cewu Lu

TL;DR

ForceMimic addresses the underutilization of force cues in imitation learning for contact-rich manipulation by integrating ForceCapture, a robot-free data-collection system, with HybridIL, a diffusion-policy-based learner that predicts wrench and pose and uses a hybrid force-position controller. The approach yields a substantial improvement in zucchini peeling performance and data-collection efficiency versus traditional vision-based imitation and teleoperation. Key contributions include ForceCapture hardware design and a force-centric imitation learning algorithm that switches between IK-based and force-position primitives based on contact forces. The results suggest force-centric IL is feasible and beneficial for robust manipulation, and point to future work on multimodal representations and expanded force-control primitives.

Abstract

In most contact-rich manipulation tasks, humans apply time-varying forces to the target object, compensating for inaccuracies in the vision-guided hand trajectory. However, current robot learning algorithms primarily focus on trajectory-based policy, with limited attention given to learning force-related skills. To address this limitation, we introduce ForceMimic, a force-centric robot learning system, providing a natural, force-aware and robot-free robotic demonstration collection system, along with a hybrid force-motion imitation learning algorithm for robust contact-rich manipulation. Using the proposed ForceCapture system, an operator can peel a zucchini in 5 minutes, while force-feedback teleoperation takes over 13 minutes and struggles with task completion. With the collected data, we propose HybridIL to train a force-centric imitation learning model, equipped with hybrid force-position control primitive to fit the predicted wrench-position parameters during robot execution. Experiments demonstrate that our approach enables the model to learn a more robust policy under the contact-rich task of vegetable peeling, increasing the success rates by 54.5% relatively compared to state-ofthe-art pure-vision-based imitation learning. Hardware, code, data and more results can be found on the project website at https://forcemimic.github.io.

ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation

TL;DR

ForceMimic addresses the underutilization of force cues in imitation learning for contact-rich manipulation by integrating ForceCapture, a robot-free data-collection system, with HybridIL, a diffusion-policy-based learner that predicts wrench and pose and uses a hybrid force-position controller. The approach yields a substantial improvement in zucchini peeling performance and data-collection efficiency versus traditional vision-based imitation and teleoperation. Key contributions include ForceCapture hardware design and a force-centric imitation learning algorithm that switches between IK-based and force-position primitives based on contact forces. The results suggest force-centric IL is feasible and beneficial for robust manipulation, and point to future work on multimodal representations and expanded force-control primitives.

Abstract

In most contact-rich manipulation tasks, humans apply time-varying forces to the target object, compensating for inaccuracies in the vision-guided hand trajectory. However, current robot learning algorithms primarily focus on trajectory-based policy, with limited attention given to learning force-related skills. To address this limitation, we introduce ForceMimic, a force-centric robot learning system, providing a natural, force-aware and robot-free robotic demonstration collection system, along with a hybrid force-motion imitation learning algorithm for robust contact-rich manipulation. Using the proposed ForceCapture system, an operator can peel a zucchini in 5 minutes, while force-feedback teleoperation takes over 13 minutes and struggles with task completion. With the collected data, we propose HybridIL to train a force-centric imitation learning model, equipped with hybrid force-position control primitive to fit the predicted wrench-position parameters during robot execution. Experiments demonstrate that our approach enables the model to learn a more robust policy under the contact-rich task of vegetable peeling, increasing the success rates by 54.5% relatively compared to state-ofthe-art pure-vision-based imitation learning. Hardware, code, data and more results can be found on the project website at https://forcemimic.github.io.

Paper Structure

This paper contains 17 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the pipeline. (a) We first transfer the collected robot-free data to (pseudo-)robot data, bridging the domain gap. The captured wrench is compensated to account for self-gravity effects. The pose recorded by SLAM camera is transformed as the robot TCP pose. And RGB-D observation images are backprojected into point cloud and filtered out unrelated points. (b) Leveraging this data, a diffusion-based policy is learned, with both pose and wrench predicted, conditioned on the encoded point cloud features, history pose and diffusion timestep embeddings. (c) According to the predicted force value, either IK joint position primitive or hybrid force-position primitive is selected, and fits the output force-position parameters to conduct execution actions.
  • Figure 2: Structure of ForceCapture. It consists of (a) a fixed-tool end-effector version, and (b) a movable gripper version, which provides (c) a unique self-lock function.
  • Figure 3: Illustration of the interface between policy and control primitive. When the hybrid force-position control primitive is active, the motion direction $\hat{\mathbf{d}}$ is calculated based on the pose trajectory $\mathbf{P}_{t:t+10}$ from policy, and the predicted forces $\mathbf{F}_{t:t+10}$ are orthogonalized to $\mathbf{F}^{\perp}_{t:t+10}$. Hybrid force-position control primitive then takes $\hat{\mathbf{d}}$ and $\mathbf{F}^{\perp}_t$ as parameters and controls the robot to track both pose and force.
  • Figure 4: Experimental setup for data collection efficiency comparison and the time required to fully peel a zucchini by different methods.
  • Figure 5: Visualization of the peeled skins by different methods. Failure cases are numbered with circles.
  • ...and 1 more figures