Robotic Compliant Object Prying Using Diffusion Policy Guided by Vision and Force Observations
Jeon Ho Kang, Sagar Joshi, Ruopeng Huang, Satyandra K. Gupta
TL;DR
The paper tackles robotic disassembly of battery-containing products by enabling a diffusion-policy that fuses vision and force for compliant object prying. It introduces a cross-attention mechanism to learn a joint image-force embedding, conditioned through FiLM within the diffusion network, coupled with force augmentation and synchronized multi-modal observations. Empirically, the approach achieves a 96% average success rate across seen and unseen objects and battery types, outperforming vision-only baselines by 57% and other force-incorporated variants, while displaying strong generalization to color and serial battery configurations. The results demonstrate robust force-guided state transitions and force-trend alignment with human demonstrations, indicating practical potential for scalable, automatic battery disassembly and other contact-rich manipulation tasks.
Abstract
The growing adoption of batteries in the electric vehicle industry and various consumer products has created an urgent need for effective recycling solutions. These products often contain a mix of compliant and rigid components, making robotic disassembly a critical step toward achieving scalable recycling processes. Diffusion policy has emerged as a promising approach for learning low-level skills in robotics. To effectively apply diffusion policy to contact-rich tasks, incorporating force as feedback is essential. In this paper, we apply diffusion policy with vision and force in a compliant object prying task. However, when combining low-dimensional contact force with high-dimensional image, the force information may be diluted. To address this issue, we propose a method that effectively integrates force with image data for diffusion policy observations. We validate our approach on a battery prying task that demands high precision and multi-step execution. Our model achieves a 96\% success rate in diverse scenarios, marking a 57\% improvement over the vision-only baseline. Our method also demonstrates zero-shot transfer capability to handle unseen objects and battery types. Supplementary videos and implementation codes are available on our project website. https://rros-lab.github.io/diffusion-with-force.github.io/
