Diffusion-based Light Field Synthesis
Ruisheng Gao, Yutong Liu, Zeyu Xiao, Zhiwei Xiong
TL;DR
This paper tackles the challenge of generating 4D light fields from a single RGB image by introducing LFdiff, a diffusion-based conditional generator. It couples a position-aware warping condition, which uses monocular depth to warp the input and encode angular position, with DistgUnet, a disentangled noise estimator that better captures spatial-angular LF information. The approach yields strong angular coherence, allows explicit disparity control, and generalizes across datasets, enabling effective downstream tasks such as LF super-resolution and refocusing. The results indicate notable improvements in fidelity and perceptual quality, suggesting broad practical impact for data-efficient LF synthesis and computational photography workflows.
Abstract
Light fields (LFs), conducive to comprehensive scene radiance recorded across angular dimensions, find wide applications in 3D reconstruction, virtual reality, and computational photography.However, the LF acquisition is inevitably time-consuming and resource-intensive due to the mainstream acquisition strategy involving manual capture or laborious software synthesis.Given such a challenge, we introduce LFdiff, a straightforward yet effective diffusion-based generative framework tailored for LF synthesis, which adopts only a single RGB image as input.LFdiff leverages disparity estimated by a monocular depth estimation network and incorporates two distinctive components: a novel condition scheme and a noise estimation network tailored for LF data.Specifically, we design a position-aware warping condition scheme, enhancing inter-view geometry learning via a robust conditional signal.We then propose DistgUnet, a disentanglement-based noise estimation network, to harness comprehensive LF representations.Extensive experiments demonstrate that LFdiff excels in synthesizing visually pleasing and disparity-controllable light fields with enhanced generalization capability.Additionally, comprehensive results affirm the broad applicability of the generated LF data, spanning applications like LF super-resolution and refocusing.
