Table of Contents
Fetching ...

Handling Object Symmetries in CNN-based Pose Estimation

Jesse Richter-Klug, Udo Frese

TL;DR

This paper investigates the problems that Convolutional Neural Networks (CNN)-based pose estimators have with symmetric objects and proposes a representation called "closed symmetry loop" (csl), where the angle of relevant vectors is multiplied by the symmetry order and then generalize it to 6-DOF.

Abstract

In this paper, we investigate the problems that Convolutional Neural Networks (CNN)-based pose estimators have with symmetric objects. We considered the value of the CNN's output representation when continuously rotating the object and found that it has to form a closed loop after each step of symmetry. Otherwise, the CNN (which is itself a continuous function) has to replicate an uncontinuous function. On a 1-DOF toy example we show that commonly used representations do not fulfill this demand and analyze the problems caused thereby. In particular, we find that the popular min-over-symmetries approach for creating a symmetry-aware loss tends not to work well with gradient-based optimization, i.e. deep learning. We propose a representation called "closed symmetry loop" (csl) from these insights, where the angle of relevant vectors is multiplied by the symmetry order and then generalize it to 6-DOF. The representation extends our algorithm from [Richter-Klug, ICVS, 2019] including a method to disambiguate symmetric equivalents during the final pose estimation. The algorithm handles continuous rotational symmetry (e.g. a bottle) and discrete rotational symmetry (e.g. a 4-fold symmetric box). It is evaluated on the T-LESS dataset, where it reaches state-of-the-art for unrefining RGB-based methods.

Handling Object Symmetries in CNN-based Pose Estimation

TL;DR

This paper investigates the problems that Convolutional Neural Networks (CNN)-based pose estimators have with symmetric objects and proposes a representation called "closed symmetry loop" (csl), where the angle of relevant vectors is multiplied by the symmetry order and then generalize it to 6-DOF.

Abstract

In this paper, we investigate the problems that Convolutional Neural Networks (CNN)-based pose estimators have with symmetric objects. We considered the value of the CNN's output representation when continuously rotating the object and found that it has to form a closed loop after each step of symmetry. Otherwise, the CNN (which is itself a continuous function) has to replicate an uncontinuous function. On a 1-DOF toy example we show that commonly used representations do not fulfill this demand and analyze the problems caused thereby. In particular, we find that the popular min-over-symmetries approach for creating a symmetry-aware loss tends not to work well with gradient-based optimization, i.e. deep learning. We propose a representation called "closed symmetry loop" (csl) from these insights, where the angle of relevant vectors is multiplied by the symmetry order and then generalize it to 6-DOF. The representation extends our algorithm from [Richter-Klug, ICVS, 2019] including a method to disambiguate symmetric equivalents during the final pose estimation. The algorithm handles continuous rotational symmetry (e.g. a bottle) and discrete rotational symmetry (e.g. a 4-fold symmetric box). It is evaluated on the T-LESS dataset, where it reaches state-of-the-art for unrefining RGB-based methods.

Paper Structure

This paper contains 17 sections, 14 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Two representations of the surface points of a $4$-fold symmetric box as color coded $3D$-vectors (unwrapped): left plain object points, right the proposed closed symmetry loop or star representation. The right is continuous and respects symmetry, whereas the left does not.
  • Figure 2: Comparing different output representations for the angle of a rotating disc with 6-fold symmetric texture. In all plots the ground truth angle is shown on the x-axis and the cyan vertical lines indicate periodicity, the ground truth and its symmetric equivalents are shown in green, the CNN prediction converted to an angle in blue and the prediction on training data is highlighted in red. g/i/k show the error of the represented object points (black=large). See Sec. \ref{['s:toyexample']} for details.
  • Figure 3: Network architecture extension overview adapted from richter2019towards. Originally (top), an RGB image is fed into a CNN, which outputs the seen object point (per pixel) as well as an estimate of their in-image uncertainties. This information is then combined by PnP with all the pixels that belong to the same object to estimate its pose ($T$) and 6d uncertainty ($\Sigma$). In this paper (bottom), we adapt this architecture with a symmetry-aware but ambiguous object point representation (star), which is aided by the dash representation, both predicted by a CNN. They are then combined to regain the object points, followed by the unchanged PnP stage.
  • Figure 4: Steps of the forth and back transformation with two points marked as examples. All quantities are actually 3D vectors, here we show X and Y for clarity, Z is the axis of symmetry. a) the image perceived by a camera looking on a box from above. b) object points $p^O$ as used in richter2019towards, c)$p^{O*}$ information predicted by the CNN, d)$p^{O'}$ information also predicted by the CNN, e)$P^O$ equivalence classes obtained from $p^{O*}$, f) consistent disambiguation of the $P^O$ using $p^{O'}$ to regain $p^o$. (blue:arbitrarily chosen reference $p_r$, red: best fitting point from equivalence class $P^O$)
  • Figure 5: Example input image a) with object segmentation b) and the unknown true object points c). The proposed reverse operation uses our outputs d) and e) to genereate the object points f). These are then used to estimate the object's pose. Note that f) is not equal to c) but it could have been. In this specific case, it is instead offsetted by two steps of symmetry.