Table of Contents
Fetching ...

Learning Correspondence for Deformable Objects

Priya Sundaresan, Aditya Ganapathi, Harry Zhang, Shivin Devgon

TL;DR

This work tackles pixelwise correspondence across deformable objects (cloth and rope) by comparing classical feature-based methods with learning-based approaches. It builds a synthetic, ground-truth framework using Blender and cloth simulation to train and evaluate descriptor-based mappings, and introduces a Dense Object Nets extension that enforces spatial and temporal continuity through Distributional Loss and L-Lipschitz regularization, as well as time-consistency considerations. The study shows Dense Object Nets generally outperform classical methods on these challenging nonrigid objects, with the proposed continuity losses achieving competitive performance and tighter, more stable correspondences. The approach holds significant promise for robotic manipulation tasks that rely on reliable pixel-to-pixel correspondences under large deformations, occlusions, and texture variation, such as cloth folding and rope manipulation.

Abstract

We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.

Learning Correspondence for Deformable Objects

TL;DR

This work tackles pixelwise correspondence across deformable objects (cloth and rope) by comparing classical feature-based methods with learning-based approaches. It builds a synthetic, ground-truth framework using Blender and cloth simulation to train and evaluate descriptor-based mappings, and introduces a Dense Object Nets extension that enforces spatial and temporal continuity through Distributional Loss and L-Lipschitz regularization, as well as time-consistency considerations. The study shows Dense Object Nets generally outperform classical methods on these challenging nonrigid objects, with the proposed continuity losses achieving competitive performance and tighter, more stable correspondences. The approach holds significant promise for robotic manipulation tasks that rely on reliable pixel-to-pixel correspondences under large deformations, occlusions, and texture variation, such as cloth folding and rope manipulation.

Abstract

We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.
Paper Structure (19 sections, 8 equations, 21 figures, 1 table)

This paper contains 19 sections, 8 equations, 21 figures, 1 table.

Figures (21)

  • Figure 1: We investigate finding correspondences across images of simulated rope and cloth.
  • Figure 2: Blender simulation of rope 1. Bezier representation of rope with control points and handles (black), 2. Mesh view of rope, 3. Raw depth rendering of rope, 4. Rope with dense pixelwise ground truth annotations (colored according to indexing scheme)
  • Figure 3: On the left is a visualization of the classification problem, where the objective is learning boundaries that separate clusters. On the right is the visualization of the task we are interested in — we are interested in learning correspondence within clusters. The function $f$ operates on members of each cluster, assigning semantic meaning to them. schmidt2016self
  • Figure 4: Ground truth $q_a$ distribution.
  • Figure 5: Ground truth bimodal $q_a$ distribution.
  • ...and 16 more figures