Efficient Neural Light Fields (ENeLF) for Mobile Devices
Austin Peng
TL;DR
The paper tackles real-time NVS on mobile devices by replacing costly volumetric rendering with a direct ray-to-pixel mapping in a neural light-field framework. It introduces ENeLF, which combines data distillation from a NeRF teacher, MobileR2L-inspired CNN backbones with a super-resolution module, and novel channel-wise structure pruning guided by BN scaling factors, marking the first pruning of a NeLF network. Outputs are produced from compact inputs (e.g., upsampling by 8× from $100\times100$ to $800\times800$, or 12× from $84\times63$ to $1008\times756$) to high-resolution views, enabling on-device rendering. Empirical results on realistic synthetic 360° and real-world forward-facing datasets show ENeLF reduces parameters, FLOPs, size, and latency relative to MobileR2L/Nerf baselines while incurring a controlled degradation in perceptual quality.
Abstract
Novel view synthesis (NVS) is a challenge in computer vision and graphics, focusing on generating realistic images of a scene from unobserved camera poses, given a limited set of authentic input images. Neural radiance fields (NeRF) achieved impressive results in rendering quality by utilizing volumetric rendering. However, NeRF and its variants are unsuitable for mobile devices due to the high computational cost of volumetric rendering. Emerging research in neural light fields (NeLF) eliminates the need for volumetric rendering by directly learning a mapping from ray representation to pixel color. NeLF has demonstrated its capability to achieve results similar to NeRF but requires a more extensive, computationally intensive network that is not mobile-friendly. Unlike existing works, this research builds upon the novel network architecture introduced by MobileR2L and aggressively applies a compression technique (channel-wise structure pruning) to produce a model that runs efficiently on mobile devices with lower latency and smaller sizes, with a slight decrease in performance.
