Investigating Image Manifolds of 3D Objects: Learning, Shape Analysis, and Comparisons
Benjamin Beaudett, Shenyuan Liang, Anuj Srivastava
TL;DR
This work addresses how image manifolds generated by varying pose and illumination of 3D objects are structured in high-dimensional image spaces. It introduces a geometry-preserving framework using Multidimensional Scaling to embed pose and illumination manifolds into low-dimensional latent spaces, and employs Kendall's shape analysis to compare manifold shapes while factoring out rigid transformations and scaling. The key contributions include demonstrating the nonlinearity of pose manifolds, establishing a shape-distance metric for cross-object comparison, and revealing that objects of the same class tend to cluster in latent-shape space; the approach is validated through simulation and extensive experiments across SO(2), $\mathbb{T}^2$, $SO(3)$, and illumination manifolds. The findings offer insights for geometry-driven learning, manifold deformation, and potential applications in clustering, transfer learning, and invertible latent-space modeling with methods like GP-StyleGAN2.
Abstract
Despite high-dimensionality of images, the sets of images of 3D objects have long been hypothesized to form low-dimensional manifolds. What is the nature of such manifolds? How do they differ across objects and object classes? Answering these questions can provide key insights in explaining and advancing success of machine learning algorithms in computer vision. This paper investigates dual tasks -- learning and analyzing shapes of image manifolds -- by revisiting a classical problem of manifold learning but from a novel geometrical perspective. It uses geometry-preserving transformations to map the pose image manifolds, sets of images formed by rotating 3D objects, to low-dimensional latent spaces. The pose manifolds of different objects in latent spaces are found to be nonlinear, smooth manifolds. The paper then compares shapes of these manifolds for different objects using Kendall's shape analysis, modulo rigid motions and global scaling, and clusters objects according to these shape metrics. Interestingly, pose manifolds for objects from the same classes are frequently clustered together. The geometries of image manifolds can be exploited to simplify vision and image processing tasks, to predict performances, and to provide insights into learning methods.
