Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects

Sean M. V. Collins; Brendan Tidd; Mahsa Baktashmotlagh; Peyman Moghadam

Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects

Sean M. V. Collins, Brendan Tidd, Mahsa Baktashmotlagh, Peyman Moghadam

TL;DR

Shape-Space Deformer introduces a unified visuo-tactile representation that encodes deformable object shapes via a learned latent space conditioned by object codes $\alpha$ and force codes $\mathbf{z}$ through a hyper-network $\Psi$, outputting parameters for a main deformation network $\mathcal{OD}$. The model predicts a surface-relevant deformation field so that, for any query point $\mathbf{x}$, the surface point is $\mathbf{x}' = \mathbf{x} + \mathcal{OD}(\mathbf{x})$, and renders surfaces by warping a template cylinder. It optimizes a joint loss combining a vector-based surface term and Chamfer distance, with regularizers to stabilize the latent space and network weights. Empirically, it significantly outperforms VIRDO on shape reconstruction, exhibits strong force and object generalization with limited data, and achieves real-time rendering while using an order of magnitude fewer parameters. This approach enables robust, fine-grained deformation modeling suitable for practical robotic manipulation tasks involving deformable objects.

Abstract

Accurate modelling of object deformations is crucial for a wide range of robotic manipulation tasks, where interacting with soft or deformable objects is essential. Current methods struggle to generalise to unseen forces or adapt to new objects, limiting their utility in real-world applications. We propose Shape-Space Deformer, a unified representation for encoding a diverse range of object deformations using template augmentation to achieve robust, fine-grained reconstructions that are resilient to outliers and unwanted artefacts. Our method improves generalization to unseen forces and can rapidly adapt to novel objects, significantly outperforming existing approaches. We perform extensive experiments to test a range of force generalisation settings and evaluate our method's ability to reconstruct unseen deformations, demonstrating significant improvements in reconstruction accuracy and robustness. Our approach is suitable for real-time performance, making it ready for downstream manipulation applications.

Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects

TL;DR

Shape-Space Deformer introduces a unified visuo-tactile representation that encodes deformable object shapes via a learned latent space conditioned by object codes

and force codes

through a hyper-network

, outputting parameters for a main deformation network

. The model predicts a surface-relevant deformation field so that, for any query point

, the surface point is

, and renders surfaces by warping a template cylinder. It optimizes a joint loss combining a vector-based surface term and Chamfer distance, with regularizers to stabilize the latent space and network weights. Empirically, it significantly outperforms VIRDO on shape reconstruction, exhibits strong force and object generalization with limited data, and achieves real-time rendering while using an order of magnitude fewer parameters. This approach enables robust, fine-grained deformation modeling suitable for practical robotic manipulation tasks involving deformable objects.

Abstract

Paper Structure (18 sections, 8 equations, 7 figures, 4 tables)

This paper contains 18 sections, 8 equations, 7 figures, 4 tables.

Introduction
Related Work
Neural Fields for Deformable Objects
Multimodal Visuo-Tactile Representations
Methodology
Unified Latent Representation
Deformation Learning
Getting to the Surface
Explicit Shape Rendering
Objective Function
Experiments
Experimental Design
Training and Implementation Details
Shape Reconstruction Results With Known Deformations
Force Generalisation Results
...and 3 more sections

Figures (7)

Figure 1: We present a unified shape representation for learning how objects $\alpha$ deform given a set of encoded forces $\mathbf{z}$. Our method generalises to unseen forces and new objects when supplied with only a few example deformations.
Figure 2: An overview of our Shape-Space Deformer Network. Given a known object class, contact locations, and applied force, our method determines the corresponding deformation field. We create a unified representation of several objects and their deformation states and generate a surface reconstruction by explicitly learning the neural field from a template shape.
Figure 3: The architecture of the hyper-network. Our unified representation takes an object code and latent force vector to condition a single policy trained on all shape types and deformation examples to predict the augmentation to be applied to a cylindrical shape template.
Figure 4: Left: SDFs only describe distance from a surface. Right: Our model $\mathcal{OD}$ focuses on "getting to the surface".
Figure 5: We evaluate generalisation performance on shapes deformed by forces from directions not seen in training. This figure shows the train and test split for each of the shapes in the Direction experiment.
...and 2 more figures

Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects

TL;DR

Abstract

Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects

Authors

TL;DR

Abstract

Table of Contents

Figures (7)