Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning
Kei Ikemura, Yifei Dong, David Blanco-Mulero, Alberta Longhini, Li Chen, Florian T. Pokorny
TL;DR
The paper tackles deformable and fragile object manipulation by minimizing internal stress while achieving task goals, using a vision-based reinforcement learning framework trained in simulation. It introduces a stress-penalized reward, offline demonstrations, and curriculum learning to bootstrap policy learning and stabilize sim-to-real transfer. Key contributions include a first visuomotor DFOM framework that explicitly accounts for fragility, a quadratic stress penalty combining mean and top-stress statistics, and demonstrated zero-shot transfer to real tofu manipulation with substantially reduced damage. The results suggest this stress-aware, model-free approach can generalize to soft objects without specialized tactile sensing, offering practical benefits for delicate manipulation tasks.
Abstract
Robotic manipulation of deformable and fragile objects presents significant challenges, as excessive stress can lead to irreversible damage to the object. While existing solutions rely on accurate object models or specialized sensors and grippers, this adds complexity and often lacks generalization. To address this problem, we present a vision-based reinforcement learning approach that incorporates a stress-penalized reward to discourage damage to the object explicitly. In addition, to bootstrap learning, we incorporate offline demonstrations as well as a designed curriculum progressing from rigid proxies to deformables. We evaluate the proposed method in both simulated and real-world scenarios, showing that the policy learned in simulation can be transferred to the real world in a zero-shot manner, performing tasks such as picking up and pushing tofu. Our results show that the learned policies exhibit a damage-aware, gentle manipulation behavior, demonstrating their effectiveness by decreasing the stress applied to fragile objects by 36.5% while achieving the task goals, compared to vanilla RL policies.
