Table of Contents
Fetching ...

GARField: Addressing the visual Sim-to-Real gap in garment manipulation with mesh-attached radiance fields

Donatien Delehelle, Darwin G. Caldwell, Fei Chen

TL;DR

GARField tackles the data bottleneck in deformable garment manipulation by learning a differentiable rendering pipeline that generates realistic observations from simulated garment states. It defines a mesh-attached, scene-embedded representation using signed distance and visual feature fields, with a two-stage process of scene capture and re-rendering to produce labeled data across novel poses. The approach introduces mesh-based coordinates via barycentric embeddings and Laplacian positional embeddings, coupled with a four-term training loss and view-direction augmentation, achieving faithful reconstruction and re-posed rendering under limited viewpoints. This work potentially enables robots to “imagine” manipulation outcomes in observation space, reducing reliance on costly real-world data and bridging the sim-to-real gap in textile manipulation, albeit at substantial computational cost. The approach sets a foundation for higher-fidelity, differentiable garment rendering inspired by NeRF-like techniques for dynamic, real-world deployment.

Abstract

While humans intuitively manipulate garments and other textile items swiftly and accurately, it is a significant challenge for robots. A factor crucial to human performance is the ability to imagine, a priori, the intended result of the manipulation intents and hence develop predictions on the garment pose. That ability allows us to plan from highly obstructed states, adapt our plans as we collect more information and react swiftly to unforeseen circumstances. Conversely, robots struggle to establish such intuitions and form tight links between plans and observations. We can partly attribute this to the high cost of obtaining densely labelled data for textile manipulation, both in quality and quantity. The problem of data collection is a long-standing issue in data-based approaches to garment manipulation. As of today, generating high-quality and labelled garment manipulation data is mainly attempted through advanced data capture procedures that create simplified state estimations from real-world observations. However, this work proposes a novel approach to the problem by generating real-world observations from object states. To achieve this, we present GARField (Garment Attached Radiance Field), the first differentiable rendering architecture, to our knowledge, for data generation from simulated states stored as triangle meshes. Code is available on https://ddonatien.github.io/garfield-website/

GARField: Addressing the visual Sim-to-Real gap in garment manipulation with mesh-attached radiance fields

TL;DR

GARField tackles the data bottleneck in deformable garment manipulation by learning a differentiable rendering pipeline that generates realistic observations from simulated garment states. It defines a mesh-attached, scene-embedded representation using signed distance and visual feature fields, with a two-stage process of scene capture and re-rendering to produce labeled data across novel poses. The approach introduces mesh-based coordinates via barycentric embeddings and Laplacian positional embeddings, coupled with a four-term training loss and view-direction augmentation, achieving faithful reconstruction and re-posed rendering under limited viewpoints. This work potentially enables robots to “imagine” manipulation outcomes in observation space, reducing reliance on costly real-world data and bridging the sim-to-real gap in textile manipulation, albeit at substantial computational cost. The approach sets a foundation for higher-fidelity, differentiable garment rendering inspired by NeRF-like techniques for dynamic, real-world deployment.

Abstract

While humans intuitively manipulate garments and other textile items swiftly and accurately, it is a significant challenge for robots. A factor crucial to human performance is the ability to imagine, a priori, the intended result of the manipulation intents and hence develop predictions on the garment pose. That ability allows us to plan from highly obstructed states, adapt our plans as we collect more information and react swiftly to unforeseen circumstances. Conversely, robots struggle to establish such intuitions and form tight links between plans and observations. We can partly attribute this to the high cost of obtaining densely labelled data for textile manipulation, both in quality and quantity. The problem of data collection is a long-standing issue in data-based approaches to garment manipulation. As of today, generating high-quality and labelled garment manipulation data is mainly attempted through advanced data capture procedures that create simplified state estimations from real-world observations. However, this work proposes a novel approach to the problem by generating real-world observations from object states. To achieve this, we present GARField (Garment Attached Radiance Field), the first differentiable rendering architecture, to our knowledge, for data generation from simulated states stored as triangle meshes. Code is available on https://ddonatien.github.io/garfield-website/
Paper Structure (12 sections, 13 equations, 8 figures, 3 tables)

This paper contains 12 sections, 13 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: By defining garment-attached signed distance and radiance fields, GARFIeld enables novel-view synthesis of re-posed meshes.
  • Figure 2: GARField models the scene as a composition of signed distance and visual feature fields. The background field is defined in the scene's global coordinates frame. The other fields are attached to objects' meshes and can be re-posed. The mesh-attached coordinates system projects query points in a coordinate system made up of the point's distance to the mesh's surface and coordinates of the surface-projection of the query point in a bespoke coordinate system built around Laplacian-based position embeddings and barycentric coordinates.
  • Figure 3: (a) the barycentric coordinates are defined as the ratio of the coloured areas to the area of $ABC$. The $\gamma$ path illustrates over-edge continuity. (b) First coordinates of our system illustrated over an arbitrary mesh.
  • Figure 4: (a) Qualitative results for reconstruction of training images. (b) Qualitative results for re-posed meshes rendering. Top left: colour, bottom left: masks, right: depth. While the mesh position is more rigid that and usual cotton sock would be, we can observe alignment of geometry and colour features. Additionally, the quality of depth images and masks is on par with training data.
  • Figure 5: (a) During the capture phase, the Franka-Emika Panda arm poses the shape template in random poses to expose all angles of the deformable object to the four RealSense cameras. (b) We have selected four socks with different patterns of varying sizes to display the performances of our model.
  • ...and 3 more figures