DEFT: Dexterous Fine-Tuning for Real-World Hand Policies
Aditya Kannan, Kenneth Shaw, Shikhar Bahl, Pragna Mannam, Deepak Pathak
TL;DR
DEFT tackles data-inefficient real-world dexterous manipulation by fusing human-video-derived affordance priors with online, CEM-based fine-tuning on a soft-hand robot. An affordance module predicts contact location, wrist pose, and post-contact hand configuration from internet videos and language cues, which is refined by a residual policy and a conditional VAE to generalize across objects. Nine diverse tabletop tasks demonstrate that DEFT can rapidly adapt in the real world (often under an hour per task) and outperform zero-shot baselines, with ablations validating the importance of priors and residual learning. Limitations include perception-noise-induced grasp diversity constraints, the need for human resets, and hardware limits on finger curl; addressing these could broaden dexterous capabilities further. Overall, DEFT provides a practical pathway to data-efficient, real-world dexterous manipulation using video-informed priors and online fine-tuning.
Abstract
Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.
