Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects
Sean M. V. Collins, Brendan Tidd, Mahsa Baktashmotlagh, Peyman Moghadam
TL;DR
Shape-Space Deformer introduces a unified visuo-tactile representation that encodes deformable object shapes via a learned latent space conditioned by object codes $\alpha$ and force codes $\mathbf{z}$ through a hyper-network $\Psi$, outputting parameters for a main deformation network $\mathcal{OD}$. The model predicts a surface-relevant deformation field so that, for any query point $\mathbf{x}$, the surface point is $\mathbf{x}' = \mathbf{x} + \mathcal{OD}(\mathbf{x})$, and renders surfaces by warping a template cylinder. It optimizes a joint loss combining a vector-based surface term and Chamfer distance, with regularizers to stabilize the latent space and network weights. Empirically, it significantly outperforms VIRDO on shape reconstruction, exhibits strong force and object generalization with limited data, and achieves real-time rendering while using an order of magnitude fewer parameters. This approach enables robust, fine-grained deformation modeling suitable for practical robotic manipulation tasks involving deformable objects.
Abstract
Accurate modelling of object deformations is crucial for a wide range of robotic manipulation tasks, where interacting with soft or deformable objects is essential. Current methods struggle to generalise to unseen forces or adapt to new objects, limiting their utility in real-world applications. We propose Shape-Space Deformer, a unified representation for encoding a diverse range of object deformations using template augmentation to achieve robust, fine-grained reconstructions that are resilient to outliers and unwanted artefacts. Our method improves generalization to unseen forces and can rapidly adapt to novel objects, significantly outperforming existing approaches. We perform extensive experiments to test a range of force generalisation settings and evaluate our method's ability to reconstruct unseen deformations, demonstrating significant improvements in reconstruction accuracy and robustness. Our approach is suitable for real-time performance, making it ready for downstream manipulation applications.
