Table of Contents
Fetching ...

Seeing the World through Your Eyes

Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang

TL;DR

The paper tackles reconstructing a 3D scene observed by an observer from reflections in their eyes using a fixed camera and natural head motion. It adapts Neural Radiance Fields by jointly learning a world radiance field and an iris texture field, incorporating a radial prior for iris textures and cornea pose refinement to separate reflections from iris details. Key components include modeling the cornea as an ellipsoid, computing reflected rays with $d' = d - 2 (n \\cdot d) n$, and optimizing a texture field alongside the radiance field with losses such as $L_{recon}$ and $L_{radial} = \\lambda_{radial} || \\Phi(p) - \\Phi(\\tilde{R}p) ||_2^2$ under a SE(3) pose $T$. Experiments on synthetic and real portrait data show promising non-line-of-sight reconstructions, while highlighting limitations in unconstrained settings and iris-color variability that guide future improvements in ocular-based scene capture.

Abstract

The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we can collect multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a 3D scene beyond the camera's line of sight using portrait images containing eye reflections. This task is challenging due to 1) the difficulty of accurately estimating eye poses and 2) the entangled appearance of the eye iris and the scene reflections. Our method jointly refines the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further propose a simple regularization prior on the iris texture pattern to improve reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, we demonstrate the feasibility of our approach to recover 3D scenes using eye reflections.

Seeing the World through Your Eyes

TL;DR

The paper tackles reconstructing a 3D scene observed by an observer from reflections in their eyes using a fixed camera and natural head motion. It adapts Neural Radiance Fields by jointly learning a world radiance field and an iris texture field, incorporating a radial prior for iris textures and cornea pose refinement to separate reflections from iris details. Key components include modeling the cornea as an ellipsoid, computing reflected rays with , and optimizing a texture field alongside the radiance field with losses such as and under a SE(3) pose . Experiments on synthetic and real portrait data show promising non-line-of-sight reconstructions, while highlighting limitations in unconstrained settings and iris-color variability that guide future improvements in ocular-based scene capture.

Abstract

The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we can collect multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a 3D scene beyond the camera's line of sight using portrait images containing eye reflections. This task is challenging due to 1) the difficulty of accurately estimating eye poses and 2) the entangled appearance of the eye iris and the scene reflections. Our method jointly refines the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further propose a simple regularization prior on the iris texture pattern to improve reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, we demonstrate the feasibility of our approach to recover 3D scenes using eye reflections.
Paper Structure (11 sections, 6 equations, 11 figures, 1 table)

This paper contains 11 sections, 6 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Radiance field reconstruction using eye reflections. The human eye is highly reflective. We show that from a sequence of frames that capture a moving head, we can reconstruct and render the 3D scene of what the person is observing using only the reflections off their eyes.
  • Figure 2: NeRF for non-line-of-sight scene. The typical NeRF capture setup requires multiple posed images (e.g., captured from a moving camera) for reconstruction. In our setup, we gather multi-view information of the scene through light reflected from the eyes of a moving person.
  • Figure 3: Cornea geometry. The cornea can be modeled as an ellipsoid. The key fact that we exploit is that the cornea shape and size are largely consistent among adults, with similar eccentricity and curvature.
  • Figure 4: Joint optimization of radiance field and iris texture. Standard NeRF rendering uses rays starting from the camera origin $O$ along a viewing direction $d$. In contrast, in our setup, we need to use rays that bounce off the cornea. The reflected ray origin $O'$ is where the initial camera ray intersects with the cornea, and the new ray direction $d'$ is the reflection of $d$ across the cornea's normal $\overrightarrow{n}$. Consequently, the eye image we observe is a composition of the iris texture and the reflected scene. The composition hinders standard NeRF training due to the highly-detailed iris texture. To address this issue, alongside the radiance field $\theta$, we train an eye texture field$\Phi$ whose input is the projection of $O'$ on the eye coordinate system in the given image (Eq. \ref{['eq:proj']}). The eye texture field is computed relative to the eye in the current image, while the radiance field takes 3D points in the world coordinates. The outputs from volumetric rendering with $\theta$ and texture estimation with $\Phi$ are composited together to reconstruct the cornea image. We apply a reconstruction loss $L_{recon}$. We further regularize the texture field $\Phi$ with a radial loss $L_{radial}$ that encourages the estimated texture to be radially constant, reducing the absorption of scene regions into the eye texture.
  • Figure 5: Qualitative synthetic results. We show that our method can achieve reasonable reconstructions from challenging measurements in simulation. We demonstrate that our method can reconstruct the 3D geometry of the scene by visualizing the accumulation of the learned radiance fields with respect to the camera poses. The accumulation is defined as the integral of the density along the camera rays.
  • ...and 6 more figures