Table of Contents
Fetching ...

Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection

Kohei Yamashita, Vincent Lepetit, Ko Nishino

TL;DR

A neural correspondence estimator and a RANSAC algorithm are introduced that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance.

Abstract

Computer vision has long relied on two kinds of correspondences: pixel correspondences in images and 3D correspondences on object surfaces. Is there another kind, and if there is, what can they do for us? In this paper, we introduce correspondences of the third kind we call reflection correspondences and show that they can help estimate camera pose by just looking at objects without relying on the background. Reflection correspondences are point correspondences in the reflected world, i.e., the scene reflected by the object surface. The object geometry and reflectance alters the scene geometrically and radiometrically, respectively, causing incorrect pixel correspondences. Geometry recovered from each image is also hampered by distortions, namely generalized bas-relief ambiguity, leading to erroneous 3D correspondences. We show that reflection correspondences can resolve the ambiguities arising from these distortions. We introduce a neural correspondence estimator and a RANSAC algorithm that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance. The method expands the horizon of numerous downstream tasks, including camera pose estimation for appearance modeling (e.g., NeRF) and motion estimation of reflective objects (e.g., cars on the road), to name a few, as it relieves the requirement of overlapping background.

Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection

TL;DR

A neural correspondence estimator and a RANSAC algorithm are introduced that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance.

Abstract

Computer vision has long relied on two kinds of correspondences: pixel correspondences in images and 3D correspondences on object surfaces. Is there another kind, and if there is, what can they do for us? In this paper, we introduce correspondences of the third kind we call reflection correspondences and show that they can help estimate camera pose by just looking at objects without relying on the background. Reflection correspondences are point correspondences in the reflected world, i.e., the scene reflected by the object surface. The object geometry and reflectance alters the scene geometrically and radiometrically, respectively, causing incorrect pixel correspondences. Geometry recovered from each image is also hampered by distortions, namely generalized bas-relief ambiguity, leading to erroneous 3D correspondences. We show that reflection correspondences can resolve the ambiguities arising from these distortions. We introduce a neural correspondence estimator and a RANSAC algorithm that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance. The method expands the horizon of numerous downstream tasks, including camera pose estimation for appearance modeling (e.g., NeRF) and motion estimation of reflective objects (e.g., cars on the road), to name a few, as it relieves the requirement of overlapping background.
Paper Structure (30 sections, 63 equations, 23 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 63 equations, 23 figures, 1 table, 1 algorithm.

Figures (23)

  • Figure 1: We humans can tell how the camera moved between the images, but computers have a hard time. Can we estimate camera pose and possibly object shape just from object appearance, despite the featureless appearance and non-overlapping background? (Photos by Richard Ellis/Alamy)
  • Figure 2: We show how to leverage three types of correspondences. Pixel correspondences (a) are the pixels that correspond to the same surface point. We can also leverage similar correspondences in the normal maps which we refer to as 3D correspondences (b). In addition to these correspondences, we leverage novel correspondences about the surrounding environment which we can observe through surface reflection. We recover camera-view reflectance maps, maps that associate surface normal orientations with the surrounding environment, and detect this type of correspondences from them (c). We refer to this novel kind of correspondences as reflection correspondences.
  • Figure 3: As depicted in (\ref{['fig:pose_ambiguity_a']}) and (\ref{['fig:pose_ambiguity_b']}), due to the generalized bas-relief ambiguity in single-view surface normal recovery belhumeur99basrelief and the fundamental difficulty in structure-from-motion harris91orthographicsfm, we cannot obtain a unique solution for the relative rotation from pixel and 3D correspondences when the cameras are orthographic (e.g., distant from the object). Reflection correspondences, i.e., correspondences regarding the incident directions for specular reflections, enable us to distinguish the correct relative rotation (\ref{['fig:pose_ref_a']}) from the other possible solutions like (\ref{['fig:pose_ref_b']}).
  • Figure 4: Given two-view images of a textureless, non-Lambertian object, we first recover the surface normals and a reflectance map for each view using a single-view geometry reconstruction method yamashita2023deepsharm. We establish 3D and reflection correspondences with a novel deep feature extraction network to compute the relative camera pose from them.
  • Figure 5: Evaluation on synthetic shapes with different levels of flatness. (\ref{['fig:distorted_bunnies']}) We create such shapes by applying GBR transformations with different $\lambda$. (\ref{['fig:distorted_bunnies']}) Accuracy for each $\lambda$. The results shows the effectiveness of using reflection correspondences. Please see the text for details.
  • ...and 18 more figures