Table of Contents
Fetching ...

Loss it right: Euclidean and Riemannian Metrics in Learning-based Visual Odometry

Olaya Álvarez-Tuñón, Yury Brodskiy, Erdal Kayacan

TL;DR

The paper investigates how pose representations and metric choices influence learning-based visual odometry (VO). Using the DeepVO backbone, it compares Euler-angle, quaternion, and SE(3) representations with corresponding losses, including a chordal distance-based loss for SE(3). Experiments on the KITTI dataset show that SE(3) with a chordal loss provides the fastest convergence and best generalization, while Euler-based losses are less effective and quaternion-based losses converge more slowly. The findings demonstrate that geometry-consistent, metric-compliant losses better capture the manifold structure of camera motion, improving VO accuracy and robustness. These insights guide the design of VO systems by aligning loss functions with the underlying geometric space.

Abstract

This paper overviews different pose representations and metric functions in visual odometry (VO) networks. The performance of VO networks heavily relies on how their architecture encodes the information. The choice of pose representation and loss function significantly impacts network convergence and generalization. We investigate these factors in the VO network DeepVO by implementing loss functions based on Euler, quaternion, and chordal distance and analyzing their influence on performance. The results of this study provide insights into how loss functions affect the designing of efficient and accurate VO networks for camera motion estimation. The experiments illustrate that a distance that complies with the mathematical requirements of a metric, such as the chordal distance, provides better generalization and faster convergence. The code for the experiments can be found at https://github.com/remaro-network/Loss_VO_right

Loss it right: Euclidean and Riemannian Metrics in Learning-based Visual Odometry

TL;DR

The paper investigates how pose representations and metric choices influence learning-based visual odometry (VO). Using the DeepVO backbone, it compares Euler-angle, quaternion, and SE(3) representations with corresponding losses, including a chordal distance-based loss for SE(3). Experiments on the KITTI dataset show that SE(3) with a chordal loss provides the fastest convergence and best generalization, while Euler-based losses are less effective and quaternion-based losses converge more slowly. The findings demonstrate that geometry-consistent, metric-compliant losses better capture the manifold structure of camera motion, improving VO accuracy and robustness. These insights guide the design of VO systems by aligning loss functions with the underlying geometric space.

Abstract

This paper overviews different pose representations and metric functions in visual odometry (VO) networks. The performance of VO networks heavily relies on how their architecture encodes the information. The choice of pose representation and loss function significantly impacts network convergence and generalization. We investigate these factors in the VO network DeepVO by implementing loss functions based on Euler, quaternion, and chordal distance and analyzing their influence on performance. The results of this study provide insights into how loss functions affect the designing of efficient and accurate VO networks for camera motion estimation. The experiments illustrate that a distance that complies with the mathematical requirements of a metric, such as the chordal distance, provides better generalization and faster convergence. The code for the experiments can be found at https://github.com/remaro-network/Loss_VO_right
Paper Structure (14 sections, 15 equations, 5 figures, 2 tables)

This paper contains 14 sections, 15 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Hierarchy of geometries. Euclidean preserves areas, angles and lengths, the similarity preserves ratios of lengths. The affine preserves volumetric ratios and parallelism. The projective preserves intersections and tangents.
  • Figure 2: Relationship between the chordal distance and the geodesic distance in the SO(3) sphere for the two rotations $R_A$ and $R_B$.The chordal distance is the straight-line distance between the two points. The geodesic distance is the shortest curve along the sphere's surface connecting the two points on the manifold. The geodesic distance is always greater than or equal to the chordal distance.
  • Figure 3: DeepVO's wang2017deepvo architecture and output shape according to each experiment. The tensor's output shape in the original setup corresponds to six values for the translation vector and the Euler angles. The SE(3) experiment provides the same output shape, in this case interpreted as the Lie algebra vector se(3). Finally, the translation vector concatenated with the quaternion vector retrieves an output with 7 values.
  • Figure 4: Top: pose loss values for the pose loss during train and validation. Bottom: Translation and rotation losses (without weighting). Note that the rotation losses do not have the same geometric interpretation.
  • Figure 5: From left to right, trajectories used for training, validation, and test. The plot shows the ground truth versus estimated pose for DeepVO using the proposed pose representations and loss functions.