Table of Contents
Fetching ...

On the Continuity of Rotation Representations in Neural Networks

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, Hao Li

TL;DR

The paper addresses the problem that common 3D rotation representations (e.g., Euler angles, quaternions) are discontinuous in neural regression, which complicates learning. It defines a formal notion of continuous rotation representation in neural networks and proves that no continuous representation exists for $SO(3)$ in spaces of dimension $\le4$, while constructing continuous representations in $5$ and $6$ dimensions and a general $n$-dimensional approach for $SO(n)$. The main contributions include (i) a precise continuity definition for representations, (ii) two continuous representations yielding $n^2-n$ and $n^2-2n+2$ dimensions (6D and 5D for $SO(3)$), (iii) extensions to related groups ($O(n)$, $Sim(n)$), and (iv) empirical validation showing that 5D/6D representations outperform traditional discontinuous forms across autoencoding, pose estimation, and inverse kinematics tasks. The results demonstrate improved learning efficiency and accuracy, with practical implications for graphics and vision systems that require robust rotation learning and regression.

Abstract

In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.

On the Continuity of Rotation Representations in Neural Networks

TL;DR

The paper addresses the problem that common 3D rotation representations (e.g., Euler angles, quaternions) are discontinuous in neural regression, which complicates learning. It defines a formal notion of continuous rotation representation in neural networks and proves that no continuous representation exists for in spaces of dimension , while constructing continuous representations in and dimensions and a general -dimensional approach for . The main contributions include (i) a precise continuity definition for representations, (ii) two continuous representations yielding and dimensions (6D and 5D for ), (iii) extensions to related groups (, ), and (iv) empirical validation showing that 5D/6D representations outperform traditional discontinuous forms across autoencoding, pose estimation, and inverse kinematics tasks. The results demonstrate improved learning efficiency and accuracy, with practical implications for graphics and vision systems that require robust rotation learning and regression.

Abstract

In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.

Paper Structure

This paper contains 21 sections, 20 equations, 8 figures.

Figures (8)

  • Figure 1: A simple 2D example, which motivates our definition of continuity of representation. See Section \ref{['sec:continuity_definition']} for details.
  • Figure 2: Our definition of continuous representation, as well as how it can apply in a neural network. See the body for details.
  • Figure 3: An illustration of stereographic projection in 2D. We are given as input a point $p$ on the unit sphere $S^1$. We construct a ray from a fixed projection point $N_0 = (0, 1)$ through $p$ and find the intersection of this ray with the plane $y = 0$. The resulting point $p'$ is the stereographic projection of $p$.
  • Figure 4: An illustration of how $n-2$ normalized projections can be made to reduce the dimensionality for the representation of $SO(n)$ from Case 3 by $n-2$. In each row we show the dimension $n$, and the elements of the vectorized representation $\gamma(M)$ containing the first $n-1$ columns of $M \in SO(n)$. Each column is length $n$: the columns are grouped by the thick black rectangles. Each unique color specifies a group of inputs for the "normalized projection" of Equation (\ref{['eqn:normalized_project']}). The white regions are not projected.
  • Figure 5: Empirical results. In (b), (e), (h) we plot on the x axis a percentile $p$ and on the y axis the error at the given percentile $p$.
  • ...and 3 more figures