Table of Contents
Fetching ...

Robotic Compliant Object Prying Using Diffusion Policy Guided by Vision and Force Observations

Jeon Ho Kang, Sagar Joshi, Ruopeng Huang, Satyandra K. Gupta

TL;DR

The paper tackles robotic disassembly of battery-containing products by enabling a diffusion-policy that fuses vision and force for compliant object prying. It introduces a cross-attention mechanism to learn a joint image-force embedding, conditioned through FiLM within the diffusion network, coupled with force augmentation and synchronized multi-modal observations. Empirically, the approach achieves a 96% average success rate across seen and unseen objects and battery types, outperforming vision-only baselines by 57% and other force-incorporated variants, while displaying strong generalization to color and serial battery configurations. The results demonstrate robust force-guided state transitions and force-trend alignment with human demonstrations, indicating practical potential for scalable, automatic battery disassembly and other contact-rich manipulation tasks.

Abstract

The growing adoption of batteries in the electric vehicle industry and various consumer products has created an urgent need for effective recycling solutions. These products often contain a mix of compliant and rigid components, making robotic disassembly a critical step toward achieving scalable recycling processes. Diffusion policy has emerged as a promising approach for learning low-level skills in robotics. To effectively apply diffusion policy to contact-rich tasks, incorporating force as feedback is essential. In this paper, we apply diffusion policy with vision and force in a compliant object prying task. However, when combining low-dimensional contact force with high-dimensional image, the force information may be diluted. To address this issue, we propose a method that effectively integrates force with image data for diffusion policy observations. We validate our approach on a battery prying task that demands high precision and multi-step execution. Our model achieves a 96\% success rate in diverse scenarios, marking a 57\% improvement over the vision-only baseline. Our method also demonstrates zero-shot transfer capability to handle unseen objects and battery types. Supplementary videos and implementation codes are available on our project website. https://rros-lab.github.io/diffusion-with-force.github.io/

Robotic Compliant Object Prying Using Diffusion Policy Guided by Vision and Force Observations

TL;DR

The paper tackles robotic disassembly of battery-containing products by enabling a diffusion-policy that fuses vision and force for compliant object prying. It introduces a cross-attention mechanism to learn a joint image-force embedding, conditioned through FiLM within the diffusion network, coupled with force augmentation and synchronized multi-modal observations. Empirically, the approach achieves a 96% average success rate across seen and unseen objects and battery types, outperforming vision-only baselines by 57% and other force-incorporated variants, while displaying strong generalization to color and serial battery configurations. The results demonstrate robust force-guided state transitions and force-trend alignment with human demonstrations, indicating practical potential for scalable, automatic battery disassembly and other contact-rich manipulation tasks.

Abstract

The growing adoption of batteries in the electric vehicle industry and various consumer products has created an urgent need for effective recycling solutions. These products often contain a mix of compliant and rigid components, making robotic disassembly a critical step toward achieving scalable recycling processes. Diffusion policy has emerged as a promising approach for learning low-level skills in robotics. To effectively apply diffusion policy to contact-rich tasks, incorporating force as feedback is essential. In this paper, we apply diffusion policy with vision and force in a compliant object prying task. However, when combining low-dimensional contact force with high-dimensional image, the force information may be diluted. To address this issue, we propose a method that effectively integrates force with image data for diffusion policy observations. We validate our approach on a battery prying task that demands high precision and multi-step execution. Our model achieves a 96\% success rate in diverse scenarios, marking a 57\% improvement over the vision-only baseline. Our method also demonstrates zero-shot transfer capability to handle unseen objects and battery types. Supplementary videos and implementation codes are available on our project website. https://rros-lab.github.io/diffusion-with-force.github.io/

Paper Structure

This paper contains 17 sections, 4 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Humans use both arms to perform compliant object prying. In this figure, a human demonstrates battery prying for four different battery types, highlighting the tight tolerances and high precision required. A spring on one end introduces compliance to the object, and dependence on direction.
  • Figure 2: The battery-recycling system consists of three steps. First, the robot responsible for prying moves to the battery. Next, diffusion policy is applied to perform the prying motion. Finally, the robot holding the battery-powered product moves to the recycling bin and deposits the battery.
  • Figure 3: Framework overview: For image data (Input 1), ResNet he2016deep is used to extract features and force data (Input 2) is linearly projected to match the size of the image features and is used as the query. The image is cropped to 98 × 98 (or any suitable dimensions) before being passed into ResNet. The cross-attention mechanism combines these inputs to output a joint embedding vector, which is then concatenated with the robot pose. This combined vector is incorporated into Feature-wise Linear Modulation (FiLM) conditioning perez2018film for noise prediction within the U-Net architecture ronneberger2015u in the diffusion framework. The output is an action sequence, $a_t$chi2023diffusionpolicy.
  • Figure 4: Products and Batteries Used in Experiments: Note that obj0 was excluded from testing to present results across three objects per battery type. Some product casings feature slanted designs with variable angles, and the depth of the casing from the top of the battery varies by approximately $\pm 4mm$. Bottom shows the products used in experiments in Section \ref{['section: edge_case']}.
  • Figure 5: Steps for Prying: The robot begins from a random initial position and moves toward the battery. It then approaches the gap for insertion. Next, the robot aligns the prying tool tip with the gap at the correct angle and moves downward for insertion. Once the tool is inside the gap, the robot tilts to pry the battery. Upon applying adequate amount of force, the robot lifts the battery. Finally, it retracts, completing the task.
  • ...and 4 more figures