Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges
Tao Zhong, Jonah Buchanan, Christine Allen-Blanchette
TL;DR
This work addresses transferring grasp intent across dexterous hands with different morphologies using vision, without paired demonstrations. It frames grasp translation as a Schrödinger Bridge–based probabilistic transport between source and target grasp distributions conditioned on object observations, optimized via latent SF$^2$M with an entropic OT plan $\pi_{\varepsilon}^*$. A two-stage latent pipeline uses a VAE to encode source observations into latent $z$, a latent Schrödinger Bridge translates to $z'$ under ground costs, and a decoder yields the target grasp, guided by four physics-informed OT costs: $d_{\mathrm{pose}}$, $d_{\mathrm{contact}}$, $d_{\mathrm{wrench}}$, and $d_{\mathrm{jac}}$. Experiments on the MultiGripperGrasp dataset show improved grasp success and functional alignment across hand–object pairs, enabling semantically meaningful grasp transfer without hand-specific simulation, and highlighting the potential of distributional transport for generalizable manipulation across heterogeneous hardware.
Abstract
We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. Given a visual observation of a source hand grasping an object, our goal is to synthesize a functionally equivalent grasp for a target hand without requiring paired demonstrations or hand-specific simulations. We frame this problem as a stochastic transport between grasp distributions using the Schrödinger Bridge formalism. Our method learns to map between source and target latent grasp spaces via score and flow matching, conditioned on visual observations. To guide this translation, we introduce physics-informed cost functions that encode alignment in base pose, contact maps, wrench space, and manipulability. Experiments across diverse hand-object pairs demonstrate our approach generates stable, physically grounded grasps with strong generalization. This work enables semantic grasp transfer for heterogeneous manipulators and bridges vision-based grasping with probabilistic generative modeling. Additional details at https://grasp2grasp.github.io/
