Table of Contents
Fetching ...

Efficient Neural Light Fields (ENeLF) for Mobile Devices

Austin Peng

TL;DR

The paper tackles real-time NVS on mobile devices by replacing costly volumetric rendering with a direct ray-to-pixel mapping in a neural light-field framework. It introduces ENeLF, which combines data distillation from a NeRF teacher, MobileR2L-inspired CNN backbones with a super-resolution module, and novel channel-wise structure pruning guided by BN scaling factors, marking the first pruning of a NeLF network. Outputs are produced from compact inputs (e.g., upsampling by 8× from $100\times100$ to $800\times800$, or 12× from $84\times63$ to $1008\times756$) to high-resolution views, enabling on-device rendering. Empirical results on realistic synthetic 360° and real-world forward-facing datasets show ENeLF reduces parameters, FLOPs, size, and latency relative to MobileR2L/Nerf baselines while incurring a controlled degradation in perceptual quality.

Abstract

Novel view synthesis (NVS) is a challenge in computer vision and graphics, focusing on generating realistic images of a scene from unobserved camera poses, given a limited set of authentic input images. Neural radiance fields (NeRF) achieved impressive results in rendering quality by utilizing volumetric rendering. However, NeRF and its variants are unsuitable for mobile devices due to the high computational cost of volumetric rendering. Emerging research in neural light fields (NeLF) eliminates the need for volumetric rendering by directly learning a mapping from ray representation to pixel color. NeLF has demonstrated its capability to achieve results similar to NeRF but requires a more extensive, computationally intensive network that is not mobile-friendly. Unlike existing works, this research builds upon the novel network architecture introduced by MobileR2L and aggressively applies a compression technique (channel-wise structure pruning) to produce a model that runs efficiently on mobile devices with lower latency and smaller sizes, with a slight decrease in performance.

Efficient Neural Light Fields (ENeLF) for Mobile Devices

TL;DR

The paper tackles real-time NVS on mobile devices by replacing costly volumetric rendering with a direct ray-to-pixel mapping in a neural light-field framework. It introduces ENeLF, which combines data distillation from a NeRF teacher, MobileR2L-inspired CNN backbones with a super-resolution module, and novel channel-wise structure pruning guided by BN scaling factors, marking the first pruning of a NeLF network. Outputs are produced from compact inputs (e.g., upsampling by 8× from to , or 12× from to ) to high-resolution views, enabling on-device rendering. Empirical results on realistic synthetic 360° and real-world forward-facing datasets show ENeLF reduces parameters, FLOPs, size, and latency relative to MobileR2L/Nerf baselines while incurring a controlled degradation in perceptual quality.

Abstract

Novel view synthesis (NVS) is a challenge in computer vision and graphics, focusing on generating realistic images of a scene from unobserved camera poses, given a limited set of authentic input images. Neural radiance fields (NeRF) achieved impressive results in rendering quality by utilizing volumetric rendering. However, NeRF and its variants are unsuitable for mobile devices due to the high computational cost of volumetric rendering. Emerging research in neural light fields (NeLF) eliminates the need for volumetric rendering by directly learning a mapping from ray representation to pixel color. NeLF has demonstrated its capability to achieve results similar to NeRF but requires a more extensive, computationally intensive network that is not mobile-friendly. Unlike existing works, this research builds upon the novel network architecture introduced by MobileR2L and aggressively applies a compression technique (channel-wise structure pruning) to produce a model that runs efficiently on mobile devices with lower latency and smaller sizes, with a slight decrease in performance.
Paper Structure (8 sections, 3 figures, 2 tables)

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: R2L/MobileR2L training and inference network design.
  • Figure 2: Modified MobileR2L model architecture to support channel-wise structure pruning.
  • Figure 3: Qualitative visual comparison between ENeLF and ground truth on the realistic synthetic 360$^\circ$ lego scene of size 800$\times$800. It is best viewed in color.