Table of Contents
Fetching ...

Diffusion-based Light Field Synthesis

Ruisheng Gao, Yutong Liu, Zeyu Xiao, Zhiwei Xiong

TL;DR

This paper tackles the challenge of generating 4D light fields from a single RGB image by introducing LFdiff, a diffusion-based conditional generator. It couples a position-aware warping condition, which uses monocular depth to warp the input and encode angular position, with DistgUnet, a disentangled noise estimator that better captures spatial-angular LF information. The approach yields strong angular coherence, allows explicit disparity control, and generalizes across datasets, enabling effective downstream tasks such as LF super-resolution and refocusing. The results indicate notable improvements in fidelity and perceptual quality, suggesting broad practical impact for data-efficient LF synthesis and computational photography workflows.

Abstract

Light fields (LFs), conducive to comprehensive scene radiance recorded across angular dimensions, find wide applications in 3D reconstruction, virtual reality, and computational photography.However, the LF acquisition is inevitably time-consuming and resource-intensive due to the mainstream acquisition strategy involving manual capture or laborious software synthesis.Given such a challenge, we introduce LFdiff, a straightforward yet effective diffusion-based generative framework tailored for LF synthesis, which adopts only a single RGB image as input.LFdiff leverages disparity estimated by a monocular depth estimation network and incorporates two distinctive components: a novel condition scheme and a noise estimation network tailored for LF data.Specifically, we design a position-aware warping condition scheme, enhancing inter-view geometry learning via a robust conditional signal.We then propose DistgUnet, a disentanglement-based noise estimation network, to harness comprehensive LF representations.Extensive experiments demonstrate that LFdiff excels in synthesizing visually pleasing and disparity-controllable light fields with enhanced generalization capability.Additionally, comprehensive results affirm the broad applicability of the generated LF data, spanning applications like LF super-resolution and refocusing.

Diffusion-based Light Field Synthesis

TL;DR

This paper tackles the challenge of generating 4D light fields from a single RGB image by introducing LFdiff, a diffusion-based conditional generator. It couples a position-aware warping condition, which uses monocular depth to warp the input and encode angular position, with DistgUnet, a disentangled noise estimator that better captures spatial-angular LF information. The approach yields strong angular coherence, allows explicit disparity control, and generalizes across datasets, enabling effective downstream tasks such as LF super-resolution and refocusing. The results indicate notable improvements in fidelity and perceptual quality, suggesting broad practical impact for data-efficient LF synthesis and computational photography workflows.

Abstract

Light fields (LFs), conducive to comprehensive scene radiance recorded across angular dimensions, find wide applications in 3D reconstruction, virtual reality, and computational photography.However, the LF acquisition is inevitably time-consuming and resource-intensive due to the mainstream acquisition strategy involving manual capture or laborious software synthesis.Given such a challenge, we introduce LFdiff, a straightforward yet effective diffusion-based generative framework tailored for LF synthesis, which adopts only a single RGB image as input.LFdiff leverages disparity estimated by a monocular depth estimation network and incorporates two distinctive components: a novel condition scheme and a noise estimation network tailored for LF data.Specifically, we design a position-aware warping condition scheme, enhancing inter-view geometry learning via a robust conditional signal.We then propose DistgUnet, a disentanglement-based noise estimation network, to harness comprehensive LF representations.Extensive experiments demonstrate that LFdiff excels in synthesizing visually pleasing and disparity-controllable light fields with enhanced generalization capability.Additionally, comprehensive results affirm the broad applicability of the generated LF data, spanning applications like LF super-resolution and refocusing.
Paper Structure (13 sections, 12 equations, 9 figures, 3 tables)

This paper contains 13 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Our LFdiff framework is able to synthesize light field from a single image. The example result is from the DIV2K dataset. Left: Sub-aperture images generated from a single image. Please view with Adobe Acrobat or KDE Okular to see animations. Middle: Refocusing to the foreground. Right: Refocusing to the background. We provide zoom-in patches for better comparison.
  • Figure 2: The overall framework of LFdiff. After iterating for $T$ timesteps, random Gaussian noise $x_{T}$ is denoised into a high-quality LF $x_{0}$. We provide the details of the position-aware warping condition scheme in (a) and depict the disentangled noise estimation network in (b). $\mathrm{R}$ and $\mathrm{R^{-1}}$ denotes SAI to macro-pixel reshape and the inverse reshape, respectively. In the training stage, we use the ground-truth disparity instead of estimated invert depth to warp the LF central view, which is omitted in (a) for simplicity.
  • Figure 3: Although the warp operation introduces occlusion artifacts (compare between patch1 and patch2) and spatial misalignment (see residue map), it provides an initial LF pattern for guidance. We use residue between GT and warped view for clarity.
  • Figure 4: Qualitative comparisons including the SAIs and EPIs of synthesized LFs from central view through different methods along with the ground truth (view coordinates: (1, 1)). Zoom in for a better visual experience.
  • Figure 5: Qualitative comparisons including the SAIs and EPIs of synthesized LFs from single images through different methods (view coordinates: (1, 1)). Zoom in for a better visual experience.
  • ...and 4 more figures