Table of Contents
Fetching ...

SpyroPose: SE(3) Pyramids for Object Pose Distribution Estimation

Rasmus Laurvig Haugaard, Frederik Hagelskjær, Thorbjørn Mosekjær Iversen

TL;DR

This paper tackles the problem of estimating pose distributions over SE(3) to capture visual ambiguities in object pose estimation. It introduces SpyroPose, which builds an SE(3) pyramid—a hierarchical grid combining SO(3) rotations and 3D translations—trained with a contrastive InfoNCE objective and enhanced by importance sampling for efficient learning. The method uses keypoint-based feature extraction within a UNet+ResNet backbone to produce location-aware embeddings, enabling real-time inference through sparse evaluation of the pyramid and a coarse-to-fine refinement strategy. Empirically, SpyroPose achieves state-of-the-art rotation distribution estimates on SYMSOL and TLESS, provides the first quantitative SE(3) distribution results on TLESS/HB, and demonstrates a powerful multi-view fusion capability that substantially increases the likelihood of the true pose. The work lays the groundwork for probabilistic, uncertainty-aware perception in robotics and opens avenues for principled sensor fusion using pose distributions.

Abstract

Object pose estimation is a core computer vision problem and often an essential component in robotics. Pose estimation is usually approached by seeking the single best estimate of an object's pose, but this approach is ill-suited for tasks involving visual ambiguity. In such cases it is desirable to estimate the uncertainty as a pose distribution to allow downstream tasks to make informed decisions. Pose distributions can have arbitrary complexity which motivates estimating unparameterized distributions, however, until now they have only been used for orientation estimation on SO(3) due to the difficulty in training on and normalizing over SE(3). We propose a novel method for pose distribution estimation on SE(3). We use a hierarchical grid, a pyramid, which enables efficient importance sampling during training and sparse evaluation of the pyramid at inference, allowing real time 6D pose distribution estimation. Our method outperforms state-of-the-art methods on SO(3), and to the best of our knowledge, we provide the first quantitative results on pose distribution estimation on SE(3). Code will be available at spyropose.github.io

SpyroPose: SE(3) Pyramids for Object Pose Distribution Estimation

TL;DR

This paper tackles the problem of estimating pose distributions over SE(3) to capture visual ambiguities in object pose estimation. It introduces SpyroPose, which builds an SE(3) pyramid—a hierarchical grid combining SO(3) rotations and 3D translations—trained with a contrastive InfoNCE objective and enhanced by importance sampling for efficient learning. The method uses keypoint-based feature extraction within a UNet+ResNet backbone to produce location-aware embeddings, enabling real-time inference through sparse evaluation of the pyramid and a coarse-to-fine refinement strategy. Empirically, SpyroPose achieves state-of-the-art rotation distribution estimates on SYMSOL and TLESS, provides the first quantitative SE(3) distribution results on TLESS/HB, and demonstrates a powerful multi-view fusion capability that substantially increases the likelihood of the true pose. The work lays the groundwork for probabilistic, uncertainty-aware perception in robotics and opens avenues for principled sensor fusion using pose distributions.

Abstract

Object pose estimation is a core computer vision problem and often an essential component in robotics. Pose estimation is usually approached by seeking the single best estimate of an object's pose, but this approach is ill-suited for tasks involving visual ambiguity. In such cases it is desirable to estimate the uncertainty as a pose distribution to allow downstream tasks to make informed decisions. Pose distributions can have arbitrary complexity which motivates estimating unparameterized distributions, however, until now they have only been used for orientation estimation on SO(3) due to the difficulty in training on and normalizing over SE(3). We propose a novel method for pose distribution estimation on SE(3). We use a hierarchical grid, a pyramid, which enables efficient importance sampling during training and sparse evaluation of the pyramid at inference, allowing real time 6D pose distribution estimation. Our method outperforms state-of-the-art methods on SO(3), and to the best of our knowledge, we provide the first quantitative results on pose distribution estimation on SE(3). Code will be available at spyropose.github.io
Paper Structure (17 sections, 9 equations, 4 figures, 6 tables)

This paper contains 17 sections, 9 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Visualization of SE(3) distributions at different levels of resolution in the pyramid. Bottom: Input image (left) and renders in green of poses weighted by their estimated probabilities for pyramid level three (middle) and five (right). Top: Marginalized SO(3) distribution with two dimensions shown by a Mollweide projection and the last dimension by hue. To show both resolution levels in the same plot, level three is shown in grayscale. The true rotation is indicated by a circle.
  • Figure 2: Qualitative SYMSOL I results. We visualize the rotations at the last pyramid level (level 6) and their likelihoods as alpha, normalized for viewing. Circles, or for continuous symmetries donut-like shapes, indicate the correct rotation up to symmetry. a) and b) are from the same image, but b) shows our method w/o KP. Our method accurately captures all 60 modes of the icosahedron.
  • Figure 3: Log likelihoods on SYMSOL I, averaged over objects, at different recursion levels of the pyramid. Both keypoints and importance sampling improves learning at deeper levels.
  • Figure 4: SE(3) distributions on TLESS. First row shows distributions for object 1. a) Six-fold rotational symmetry. b) Continuous rotational symmetry. c) No symmetry. Second row shows distributions for object 14. d) Continuous rotational symmetry. e) The object of interest is behind the foreground object. Two-fold and continuous rotational symmetry. Note that the two discrete modes have different depths, which can only be represented by a joint distribution. f) The continuous rotational symmetry is disambiguated by now visible features at the end of the object, and only a two-fold rotational symmetry along the same axis remains. g) and h) shows no symmetry and a two-fold rotational symmetry, respectively, for object 25. i) shows a four-fold rotational symmetry for object 27.