Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences

Shishir Reddy Vutukur; Rasmus Laurvig Haugaard; Junwen Huang; Benjamin Busam; Tolga Birdal

Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences

Shishir Reddy Vutukur, Rasmus Laurvig Haugaard, Junwen Huang, Benjamin Busam, Tolga Birdal

TL;DR

This work proposes a pose distribution estimation method leveraging symmetry respecting correspondence distributions and shape information obtained using a CAD model that converges much faster and learns distribution better by focusing on learning sharper distribution near all the valid modes, unlike contrastive approaches, which focus on a single mode at a time.

Abstract

Object pose distribution estimation is crucial in robotics for better path planning and handling of symmetric objects. Recent distribution estimation approaches employ contrastive learning-based approaches by maximizing the likelihood of a single pose estimate in the absence of a CAD model. We propose a pose distribution estimation method leveraging symmetry respecting correspondence distributions and shape information obtained using a CAD model. Contrastive learning-based approaches require an exhaustive amount of training images from different viewpoints to learn the distribution properly, which is not possible in realistic scenarios. Instead, we propose a pipeline that can leverage correspondence distributions and shape information from the CAD model, which are later used to learn pose distributions. Besides, having access to pose distribution based on correspondences before learning pose distributions conditioned on images, can help formulate the loss between distributions. The prior knowledge of distribution also helps the network to focus on getting sharper modes instead. With the CAD prior, our approach converges much faster and learns distribution better by focusing on learning sharper distribution near all the valid modes, unlike contrastive approaches, which focus on a single mode at a time. We achieve benchmark results on SYMSOL-I and T-Less datasets.

Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences

TL;DR

Abstract

Paper Structure (39 sections, 1 theorem, 9 equations, 8 figures, 7 tables)

This paper contains 39 sections, 1 theorem, 9 equations, 8 figures, 7 tables.

Introduction
Related Work
Rotation representations for deep networks
Representing belief over rotations in deep networks
Uncertainty and ambiguity aware object pose estimation
Method
Problem setting
Model
Inference
Network Design, Positional Encoding and Impl. Details
Network architecture
Sampling
Positional encoding(PE)
Experiments
SYMSOL-I
...and 24 more sections

Key Result

Proposition 1

The probability $p(\mathbf{R}\:\vert\:{\mathbf{I}})\propto p(\mathbf{X}^\prime \:\vert\:{\mathbf{I}})$ where $\mathbf{X}^\prime = \mathbf{R}\Pi(\mathbf{R}_\mathrm{gt}^{\top}\mathbf{X}_0;\mathbf{I},\mathcal{M})$.

Figures (8)

Figure 1: Training and Inference Pipeline: We employ a training mechanism where supervision is generated from pre-trained SurfEmb($f_\mathrm{SE}$) and SDF( $f_\mathrm{SDF}$) blocks. The CAD model, $\mathbf{X}_0$, undergoes a projection to render an image aligned point cloud, $\mathbf{X}$. The image aligned point cloud, $\mathbf{X}$, is rotated with ground truth rotation, $\mathbf{R}_{gt}$, and passed through $f_\mathrm{SE}$ block to estimate canonical features. Similarly, $\mathbf{X}$, is rotated with a random rotations, $\mathbf{R}_{k}$, and passed through $f_\mathrm{SE}$ block to generate features that are compared with canonical features to estimate the score $\bm{\mu}_\mathrm{SE}$ for the rotation. Similarly, the rotated point cloud with a random rotation is passed through $f_\mathrm{SDF}$ to estimate the SDF values of the point cloud. An $L_0$ norm is applied to the SDF values to compute $\bm{\mu}_\mathrm{SDF}$ score for the rotation. These scores are used to supervise the Dual-branch MLP. The Dual-branch MLP network takes an image and the same rotation matrix, $\mathbf{R}_{k}$, as input and infers two scores $\bm{\mu}_\mathrm{\theta}$ and $\bm{\mu}_\mathrm{\phi}$. This process is carried out for $K$ rotations for a given image and a Generalized KL divergence loss (GKL) is formulated between inferred scores from the right block and estimated scores from the left block to train the Dual-branch MLP network. The Dual-branch MLP is part of both training and inference. During inference, an image and rotations sampled from a grid are passed through the network to estimate the full distribution on the grid.
Figure 1: Pose distribution visualization for different objects in Symsol-I. Each row corresponds to a single object from Symsol. The distributions for cylinder, tetrahedron, cube, and cone objects are visualized. Cylinder and cone express continuous symmetries indicated by smooth curves, unlike tetrahedron and cube which have discrete modes.
Figure 2: LL vs. Training Data for NF, Ours on SYMSOL-I.
Figure 2: Pose distribution visualization for different objects in Symsol-II. Each row corresponds to images from SphereX, TetX and CylO objects respectively. The middle column indicates the distribution when the markers are not visible. The left and right columns indicate the sharper distribution when the markers are visible. This shows that our approach can capture the distribution based on the texture component and make it sharper based on the marker when it is visible. This is possible because of the surfemb features which learn different features for textured regions and different features for untextured regions as shown in Figure \ref{['fig:featViz']}.
Figure 3: Pose distribution visualization for objects in Symsol-I and Symsol-II. a) Untextured Cylinder has continuous symmetry b) Textured Cylinder with marker has a unimodal distribution when the marker is visible, c) Broken continuous symmetry on the textured cylinder when the marker is not visible d) Untextured Tetrahedron in Symsol-I has 12 modes which are captured appropriately e) Textured Tetrahedron has three modes when orange face is visible. f) Textured terahedron has 6 modes when the orange face is not visible. Note that only one ground truth annotation is provided in Symsol-II and hence only one mode is circled.
...and 3 more figures

Theorems & Definitions (2)

Proposition 1
proof

Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences

TL;DR

Abstract

Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)