Learning Correspondence for Deformable Objects

Priya Sundaresan; Aditya Ganapathi; Harry Zhang; Shivin Devgon

Learning Correspondence for Deformable Objects

Priya Sundaresan, Aditya Ganapathi, Harry Zhang, Shivin Devgon

TL;DR

This work tackles pixelwise correspondence across deformable objects (cloth and rope) by comparing classical feature-based methods with learning-based approaches. It builds a synthetic, ground-truth framework using Blender and cloth simulation to train and evaluate descriptor-based mappings, and introduces a Dense Object Nets extension that enforces spatial and temporal continuity through Distributional Loss and L-Lipschitz regularization, as well as time-consistency considerations. The study shows Dense Object Nets generally outperform classical methods on these challenging nonrigid objects, with the proposed continuity losses achieving competitive performance and tighter, more stable correspondences. The approach holds significant promise for robotic manipulation tasks that rely on reliable pixel-to-pixel correspondences under large deformations, occlusions, and texture variation, such as cloth folding and rope manipulation.

Abstract

We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.

Learning Correspondence for Deformable Objects

TL;DR

Abstract

Paper Structure (19 sections, 8 equations, 21 figures, 1 table)

This paper contains 19 sections, 8 equations, 21 figures, 1 table.

Simulator Design
Problem Statement
Methods
SIFT: Scale Invariant Feature Transform
SURF: Speeded Up Robust Features
ORB: Oriented FAST and Rotated BRIEF
TimeCycle
Dense Object Nets - Pixelwise Contrastive Method
Proposed Method - Enforcing Spatial Continuity in Correspondence Estimation Through a Reformulated Loss Function
Distributional Loss
L-Lipschitz Regularization
Enforcing Time-Consistency
Overall Quantitative Results
Detailed Results/Appendix
SIFT
...and 4 more sections

Figures (21)

Figure 1: We investigate finding correspondences across images of simulated rope and cloth.
Figure 2: Blender simulation of rope 1. Bezier representation of rope with control points and handles (black), 2. Mesh view of rope, 3. Raw depth rendering of rope, 4. Rope with dense pixelwise ground truth annotations (colored according to indexing scheme)
Figure 3: On the left is a visualization of the classification problem, where the objective is learning boundaries that separate clusters. On the right is the visualization of the task we are interested in — we are interested in learning correspondence within clusters. The function $f$ operates on members of each cluster, assigning semantic meaning to them. schmidt2016self
Figure 4: Ground truth $q_a$ distribution.
Figure 5: Ground truth bimodal $q_a$ distribution.
...and 16 more figures

Learning Correspondence for Deformable Objects

TL;DR

Abstract

Learning Correspondence for Deformable Objects

Authors

TL;DR

Abstract

Table of Contents

Figures (21)