Gentle Object Retraction in Dense Clutter Using Multimodal Force Sensing and Imitation Learning
Dane Brouwer, Joshua Citron, Heather Nolte, Jeannette Bohg, Mark Cutkosky
TL;DR
This paper tackles the challenge of safely retracting objects from densely cluttered environments by embracing contact rather than avoiding it. It introduces a sensorized end-effector and a diffusion-based imitation-learning framework that fuses eye-in-hand vision, proprioception, non-prehensile tactile sensing, wrench estimates, and suction-pressure signals. Through a force ablation study on 100 demonstrations across highly variable clutter scenes, the authors show that incorporating force modalities reduces excessive-contact events and improves both success rate and speed, with the combination of wrench and tactile sensing delivering the strongest gains (up to 80% over a no-force baseline). The work highlights the value of multimodal sensing for contact-rich manipulation in constrained spaces and suggests future directions in kinesthetic feedback, adaptive gentleness, and broader generalization to unseen objects and environments.
Abstract
Dense collections of movable objects are common in everyday spaces-from cabinets in a home to shelves in a warehouse. Safely retracting objects from such collections is difficult for robots, yet people do it frequently, leveraging learned experience in tandem with vision and non-prehensile tactile sensing on the sides and backs of their hands and arms. We investigate the role of contact force sensing for training robots to gently reach into constrained clutter and extract objects. The available sensing modalities are (1) "eye-in-hand" vision, (2) proprioception, (3) non-prehensile triaxial tactile sensing, (4) contact wrenches estimated from joint torques, and (5) a measure of object acquisition obtained by monitoring the vacuum line of a suction cup. We use imitation learning to train policies from a set of demonstrations on randomly generated scenes, then conduct an ablation study of wrench and tactile information. We evaluate each policy's performance across 40 unseen environment configurations. Policies employing any force sensing show fewer excessive force failures, an increased overall success rate, and faster completion times. The best performance is achieved using both tactile and wrench information, producing an 80% improvement above the baseline without force information.
