Table of Contents
Fetching ...

Neural Light Spheres for Implicit Image Stitching and View Synthesis

Ilya Chugunov, Amogh Joshi, Kiran Murthy, Francois Bleibel, Felix Heide

TL;DR

This work introduces Neural Light Spheres, a compact spherical neural light-field model for implicit panoramic image stitching and view synthesis that fits at test-time to arbitrary path panoramas. By decomposing the scene into a view-dependent ray offset and a view-dependent color component, and implementing hash-grid encodings on a sphere, the method achieves real-time 1080p rendering at 50 FPS with an 80 MB model. The approach demonstrates improved reconstruction quality over traditional stitching and radiance-field baselines, and shows resilience to motion and low-light sensor noise through end-to-end training on RAW data collected with an Android app. The work enables interactive, wide-field panoramic experiences on mobile devices and opens avenues for broader imaging domains with similar hardware constraints.

Abstract

Challenging to capture, and challenging to display on a cellphone screen, the panorama paradoxically remains both a staple and underused feature of modern mobile camera applications. In this work we address both of these challenges with a spherical neural light field model for implicit panoramic image stitching and re-rendering; able to accommodate for depth parallax, view-dependent lighting, and local scene motion and color changes during capture. Fit during test-time to an arbitrary path panoramic video capture -- vertical, horizontal, random-walk -- these neural light spheres jointly estimate the camera path and a high-resolution scene reconstruction to produce novel wide field-of-view projections of the environment. Our single-layer model avoids expensive volumetric sampling, and decomposes the scene into compact view-dependent ray offset and color components, with a total model size of 80 MB per scene, and real-time (50 FPS) rendering at 1080p resolution. We demonstrate improved reconstruction quality over traditional image stitching and radiance field methods, with significantly higher tolerance to scene motion and non-ideal capture settings.

Neural Light Spheres for Implicit Image Stitching and View Synthesis

TL;DR

This work introduces Neural Light Spheres, a compact spherical neural light-field model for implicit panoramic image stitching and view synthesis that fits at test-time to arbitrary path panoramas. By decomposing the scene into a view-dependent ray offset and a view-dependent color component, and implementing hash-grid encodings on a sphere, the method achieves real-time 1080p rendering at 50 FPS with an 80 MB model. The approach demonstrates improved reconstruction quality over traditional stitching and radiance-field baselines, and shows resilience to motion and low-light sensor noise through end-to-end training on RAW data collected with an Android app. The work enables interactive, wide-field panoramic experiences on mobile devices and opens avenues for broader imaging domains with similar hardware constraints.

Abstract

Challenging to capture, and challenging to display on a cellphone screen, the panorama paradoxically remains both a staple and underused feature of modern mobile camera applications. In this work we address both of these challenges with a spherical neural light field model for implicit panoramic image stitching and re-rendering; able to accommodate for depth parallax, view-dependent lighting, and local scene motion and color changes during capture. Fit during test-time to an arbitrary path panoramic video capture -- vertical, horizontal, random-walk -- these neural light spheres jointly estimate the camera path and a high-resolution scene reconstruction to produce novel wide field-of-view projections of the environment. Our single-layer model avoids expensive volumetric sampling, and decomposes the scene into compact view-dependent ray offset and color components, with a total model size of 80 MB per scene, and real-time (50 FPS) rendering at 1080p resolution. We demonstrate improved reconstruction quality over traditional image stitching and radiance field methods, with significantly higher tolerance to scene motion and non-ideal capture settings.
Paper Structure (24 sections, 16 equations, 13 figures, 1 table)

This paper contains 24 sections, 16 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Neural Light Sphere Model. Taking as input panoramic video capture $I(u,v,n)$, we perform backward camera projection from a point $X=(u,v)$ into a spherical hull to estimate an initial intersection point $P$. Ray offset model $f_{\textsc{r}}(\hat{P},X)$ then bends this ray to a corrected point $\hat{P}^*$, which is used to sample the view-dependent color model $f_{\textsc{c}}(\hat{P}^*,X)$. Simulating a new virtual camera with our desired position and FOV, we use this neural light sphere model to re-render the scene to novel views.
  • Figure 2: Hash Grid Spheres. In this 2D example we can observe how, for points on a circle, the number of accessed elements in the backing grid roughly doubles for a squaring of grid elements. Given an efficient mapping from grid location to element -- e.g., hash table lookup -- this forms a compact representation even at high resolutions, where storing a dense grid would be computationally intractable.
  • Figure 3: Two Stage Training. Breaking training into two stages allows the camera pose and static image model to first fit an approximation of the scene before view-dependent effects are introduced via $h_\textsc{r}$ and $h_\textsc{d}$. This helps avoid artifacts during early training, like the discontinuities around the sign in the Single Stage example, which result in poor final reconstruction quality.
  • Figure 4: Ray Perturbations. By applying small perturbations to ray origins $O$ we are able to avoid hard-to-escape local minima solutions during early training epochs. In (a) we see how for the road, a region with low image texture, the No Perturbation example duplicates content; creating two copies of the #10 parking spot. In (b) we see how for repeated textures, perturbations can also help avoid "crunching" content in early training, where the repeated cans in the vending machine are accidentally aligned on top of each other.
  • Figure 5: Data Capture. We develop an open-source Android-based mobile application to facilitate in-the-wild capture of scenes. The app's settings allow for camera selection (main, ultrawide, or telephoto) and to either use the device's auto-focus and auto-exposure features for capture, or set their respective values. During capture, we record full resolution Bayer RAW images, device accelerometer and gyroscope measurements, and all exposed camera and frame metadata including: ISO, exposure, timestamps, camera intrinsics, and color and tone correction values.
  • ...and 8 more figures