Neural Light Spheres for Implicit Image Stitching and View Synthesis
Ilya Chugunov, Amogh Joshi, Kiran Murthy, Francois Bleibel, Felix Heide
TL;DR
This work introduces Neural Light Spheres, a compact spherical neural light-field model for implicit panoramic image stitching and view synthesis that fits at test-time to arbitrary path panoramas. By decomposing the scene into a view-dependent ray offset and a view-dependent color component, and implementing hash-grid encodings on a sphere, the method achieves real-time 1080p rendering at 50 FPS with an 80 MB model. The approach demonstrates improved reconstruction quality over traditional stitching and radiance-field baselines, and shows resilience to motion and low-light sensor noise through end-to-end training on RAW data collected with an Android app. The work enables interactive, wide-field panoramic experiences on mobile devices and opens avenues for broader imaging domains with similar hardware constraints.
Abstract
Challenging to capture, and challenging to display on a cellphone screen, the panorama paradoxically remains both a staple and underused feature of modern mobile camera applications. In this work we address both of these challenges with a spherical neural light field model for implicit panoramic image stitching and re-rendering; able to accommodate for depth parallax, view-dependent lighting, and local scene motion and color changes during capture. Fit during test-time to an arbitrary path panoramic video capture -- vertical, horizontal, random-walk -- these neural light spheres jointly estimate the camera path and a high-resolution scene reconstruction to produce novel wide field-of-view projections of the environment. Our single-layer model avoids expensive volumetric sampling, and decomposes the scene into compact view-dependent ray offset and color components, with a total model size of 80 MB per scene, and real-time (50 FPS) rendering at 1080p resolution. We demonstrate improved reconstruction quality over traditional image stitching and radiance field methods, with significantly higher tolerance to scene motion and non-ideal capture settings.
