Table of Contents
Fetching ...

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

Jongmin Lee, Minsu Cho

TL;DR

This work proposes a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression, aligning with the operations of spherical CNNs, which operate in the frequency domain to enhance computational efficiency.

Abstract

Determining the 3D orientations of an object in an image, known as single-image pose estimation, is a crucial task in 3D vision applications. Existing methods typically learn 3D rotations parametrized in the spatial domain using Euler angles or quaternions, but these representations often introduce discontinuities and singularities. SO(3)-equivariant networks enable the structured capture of pose patterns with data-efficient learning, but the parametrizations in spatial domain are incompatible with their architecture, particularly spherical CNNs, which operate in the frequency domain to enhance computational efficiency. To overcome these issues, we propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression, aligning with the operations of spherical CNNs. Our SO(3)-equivariant pose harmonics predictor overcomes the limitations of spatial parameterizations, ensuring consistent pose estimation under arbitrary rotations. Trained with a frequency-domain regression loss, our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+, with significant improvements in accuracy, robustness, and data efficiency.

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

TL;DR

This work proposes a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression, aligning with the operations of spherical CNNs, which operate in the frequency domain to enhance computational efficiency.

Abstract

Determining the 3D orientations of an object in an image, known as single-image pose estimation, is a crucial task in 3D vision applications. Existing methods typically learn 3D rotations parametrized in the spatial domain using Euler angles or quaternions, but these representations often introduce discontinuities and singularities. SO(3)-equivariant networks enable the structured capture of pose patterns with data-efficient learning, but the parametrizations in spatial domain are incompatible with their architecture, particularly spherical CNNs, which operate in the frequency domain to enhance computational efficiency. To overcome these issues, we propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression, aligning with the operations of spherical CNNs. Our SO(3)-equivariant pose harmonics predictor overcomes the limitations of spatial parameterizations, ensuring consistent pose estimation under arbitrary rotations. Trained with a frequency-domain regression loss, our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+, with significant improvements in accuracy, robustness, and data efficiency.

Paper Structure

This paper contains 44 sections, 9 equations, 11 figures, 18 tables.

Figures (11)

  • Figure 1: Types of representations for 3D rotation prediction. Existing methods consider predicting 3D rotations in the spatial domain. Our method predicts Wigner-D coefficients in the frequency domain, to obtain accurate pose in continuous space using an SO(3)-equivariant network.
  • Figure 2: Overall architecture. Our network for SO(3)-equivariant pose estimation consists of four parts: feature extraction, spherical mapper, Fourier transformer, and SO(3)-equivariant layers. First, we extract a feature map using a pre-trained ResNet. Next, the spherical mapper orthographically projects the extracted feature map onto a spherical surface. The Fourier transformer converts this spatial information into the frequency domain. We utilize spherical convolutions to obtain the final Wigner-D harmonics coefficients $\Psi$ which represent SO(3) rotations of spherical harmonics, where $M$ denotes the total number of Wigner-D matrix coefficients.
  • Figure 3: Illustration of spherical mapper and spherical convolution for SO(3)-equivariance. This structure allows for the prediction of 3D rotations while preserving the SO(3)-equivariance of the input structure. Predicting the Wigner-D harmonics $\Psi$ enables continuous 3D rotation modeling, without discretizing the group actions.
  • Figure 4: Inference time. We query the output vector of Wigner-D coefficients $\Psi$ against the predefined SO(3) HEALPix grid with a resolution of $Q$ points. We finally obtain the SO(3) probability distribution $P(R \mid I)$, where each position represents the probability of a specific SO(3) pose.
  • Figure 5: Experiment on ModelNet10-SO(3) with few-shot training views. Results with solid lines of I-PDF murphy2021implicit, I2S klee2023image, and RotLaplace yin2023laplace denote to a ResNet-50 backbone, while dotted lines indicate a ResNet-101 backbone. Our method outperforms all metrics and reduces training views. Baseline results yin2023laplaceklee2023image were obtained using the source code provided by the authors.
  • ...and 6 more figures