Table of Contents
Fetching ...

Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

Malek Aburub, Cristian C. Beltran-Hernandez, Tatsuya Kamijo, Masashi Hamaya

TL;DR

This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework designed for compliant control tasks that enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends the previous work by demonstrating its effectiveness in real-world tasks.

Abstract

Robots hold great promise for performing repetitive or hazardous tasks, but achieving human-like dexterity, especially in contact-rich and dynamic environments, remains challenging. Rigid robots, which rely on position or velocity control, often struggle with maintaining stable contact and applying consistent force in force-intensive tasks. Learning from Demonstration has emerged as a solution, but tasks requiring intricate maneuvers, such as powder grinding, present unique difficulties. This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework designed for compliant control tasks. By leveraging generative diffusion models, we develop a policy that predicts Cartesian end-effector poses and adjusts arm stiffness to maintain the necessary force. Our approach enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends our previous work by demonstrating its effectiveness in real-world tasks. We present a detailed comparison between our framework and existing methods, highlighting the advantages and best practices for deploying diffusion-based compliance control.

Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

TL;DR

This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework designed for compliant control tasks that enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends the previous work by demonstrating its effectiveness in real-world tasks.

Abstract

Robots hold great promise for performing repetitive or hazardous tasks, but achieving human-like dexterity, especially in contact-rich and dynamic environments, remains challenging. Rigid robots, which rely on position or velocity control, often struggle with maintaining stable contact and applying consistent force in force-intensive tasks. Learning from Demonstration has emerged as a solution, but tasks requiring intricate maneuvers, such as powder grinding, present unique difficulties. This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework designed for compliant control tasks. By leveraging generative diffusion models, we develop a policy that predicts Cartesian end-effector poses and adjusts arm stiffness to maintain the necessary force. Our approach enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends our previous work by demonstrating its effectiveness in real-world tasks. We present a detailed comparison between our framework and existing methods, highlighting the advantages and best practices for deploying diffusion-based compliance control.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: DIPCOM Denoising Process: illustration of how DIPCOM predicts end-effector positions and adjusts arm stiffness during multiple timesteps, enabling consistent force during contact and smooth transitions between movements.
  • Figure 2: Policy Framework: Left: Dataset collection framework. Middle: Observations $O$ include images $i_{t-1,t}$, robot Cartesian pose $s_{t-1,t}$, and measured force/torque $f_{t-1,t}$, all encoded using a self-attention transformer. Right: During training, actions $a_{t}$—comprising the end-effector pose $p$, gripper pose $g$, and stiffness $K$—are processed through a noise scheduler that adds Gaussian noise $\epsilon$ over time steps $n$. These noisy actions are then input into the transformer decoder block. During inference, Gaussian noise replaces the training noise, and the transformer decoder block predicts the actions $\hat{a}_{t}$
  • Figure 3: Contact-rich manipulation tasks used for evaluation. A - Powder grinding. B - Pencil eraser. C - Bimanual round peg insertion. D - Bimanual cuboid peg insertion.