Table of Contents
Fetching ...

Learning with 3D rotations, a hitchhiker's guide to SO(3)

A. René Geist, Jonas Frey, Mikel Zhobro, Anna Levina, Georg Martius

TL;DR

This survey analyzes rotation representations for $SO(3)$ in gradient-based neural regression, arguing that low-dimensional mappings (≤4D) introduce discontinuities that harm learnability, especially when rotations are in the output. It compares several representations, showing that $\\mathbb{R}^9$+SVD and $\\mathbb{R}^6$+GSO consistently yield better optimization properties and generalization than Euler, axis-angle, or quaternion-only approaches; distance-picking and half-space tricks do not fully resolve fundamental discontinuities. Empirical experiments across rotation estimation and feature-prediction tasks corroborate the theoretical guidance, favoring high-dimensional representations and, in small-angle dynamics, half-space-mapped quaternions as a practical compromise. The work provides concrete recommendations for choosing rotation representations depending on whether rotations are inputs or outputs and on the expected rotation magnitude, with implications for pose estimation, 3D vision, and robotics applications, while highlighting trade-offs in computation and training stability.

Abstract

Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.

Learning with 3D rotations, a hitchhiker's guide to SO(3)

TL;DR

This survey analyzes rotation representations for in gradient-based neural regression, arguing that low-dimensional mappings (≤4D) introduce discontinuities that harm learnability, especially when rotations are in the output. It compares several representations, showing that +SVD and +GSO consistently yield better optimization properties and generalization than Euler, axis-angle, or quaternion-only approaches; distance-picking and half-space tricks do not fully resolve fundamental discontinuities. Empirical experiments across rotation estimation and feature-prediction tasks corroborate the theoretical guidance, favoring high-dimensional representations and, in small-angle dynamics, half-space-mapped quaternions as a practical compromise. The work provides concrete recommendations for choosing rotation representations depending on whether rotations are inputs or outputs and on the expected rotation magnitude, with implications for pose estimation, 3D vision, and robotics applications, while highlighting trade-offs in computation and training stability.

Abstract

Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.
Paper Structure (57 sections, 18 equations, 24 figures, 2 tables)

This paper contains 57 sections, 18 equations, 24 figures, 2 tables.

Figures (24)

  • Figure 1: Overview on learning with rotations. A neural network learns a function from a feature space $\mathcal{A}$ to a rotation representation space $\mathcal{R}$ or vice versa. When learning with rotations, the properties of $f\,:\,\mathcal{R} \rightarrow \mathrm{SO}(3)$ and $g\,:\,\mathrm{SO}(3)\rightarrow \mathcal{R}$ affect training.
  • Figure 2: Left: Representations for SO(2): angle and sin/cos with respective $g$ functions. Right: Euler angles representation for SO(3). In Euler angles a frame can be visualized as three consecutive SO(2) rotations along the surface of a torus.
  • Figure 3: Exponential coordinates (Exp) and axis-angle representation and their double cover property. Top: Exp. coord.: rotation around $\omega$ by angle $\|\omega\|$. The vector $\omega_1= \alpha_1 \tilde{\omega}_1 \in \mathbb{R}^3$ describes the same rotation as $\omega_2=(\|\omega_1\|-2\pi)\omega_1/\|\omega_1\|$. Bottom: Axis-angles explicitly represent axis and angle: $\tilde{\omega} \in \mathcal{S}^2$, $\alpha \in \mathbb{R}$. The vector $[\tilde{\omega}, \alpha]$ describes the same rotation as $[-\tilde{\omega}, -\alpha]$.
  • Figure 4: Geometric illustration of distance metrics $d(a,b)$ between vectors $a$ and $b$. Cosine distance ($d_\textrm{cd}$) and angular distance ($d_\textrm{ang}$) ignore the vectors' lengths.
  • Figure 5: The target function $h^*(x)$ is the composition between $\tilde{h}(x)$ and the functions $g(R)$ / $f(r)$.
  • ...and 19 more figures