Table of Contents
Fetching ...

BokehDiff: Neural Lens Blur with One-Step Diffusion

Chengxuan Zhu, Qingnan Fan, Qi Zhang, Jinwei Chen, Huaqi Zhang, Chao Xu, Boxin Shi

TL;DR

BokehDiff tackles the challenge of rendering photorealistic lens blur when depth priors are imperfect by marrying physics-inspired constraints with diffusion priors in a one-step diffusion framework. The method introduces a physics-inspired self-attention (PISA) to enforce energy conservation, circle-of-confusion limits, and self-occlusion, and leverages a scalable data synthesis pipeline that uses diffusion-generated foregrounds with transparency to create paired training data. Key contributions include the one-step inference scheme, the PISA module, and a diffusion-based data synthesis approach yielding robust performance across depth discontinuities and real-world scenes. The approach is shown to outperform prior bokeh methods on real and synthetic datasets, offering a practical, efficient, and高-fidelity solution for neural lens blur rendering with broad potential impact on computational photography and mobile imaging systems.

Abstract

We introduce BokehDiff, a novel lens blur rendering method that achieves physically accurate and visually appealing outcomes, with the help of generative diffusion prior. Previous methods are bounded by the accuracy of depth estimation, generating artifacts in depth discontinuities. Our method employs a physics-inspired self-attention module that aligns with the image formation process, incorporating depth-dependent circle of confusion constraint and self-occlusion effects. We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity. To address the lack of scalable paired data, we propose to synthesize photorealistic foregrounds with transparency with diffusion models, balancing authenticity and scene diversity.

BokehDiff: Neural Lens Blur with One-Step Diffusion

TL;DR

BokehDiff tackles the challenge of rendering photorealistic lens blur when depth priors are imperfect by marrying physics-inspired constraints with diffusion priors in a one-step diffusion framework. The method introduces a physics-inspired self-attention (PISA) to enforce energy conservation, circle-of-confusion limits, and self-occlusion, and leverages a scalable data synthesis pipeline that uses diffusion-generated foregrounds with transparency to create paired training data. Key contributions include the one-step inference scheme, the PISA module, and a diffusion-based data synthesis approach yielding robust performance across depth discontinuities and real-world scenes. The approach is shown to outperform prior bokeh methods on real and synthetic datasets, offering a practical, efficient, and高-fidelity solution for neural lens blur rendering with broad potential impact on computational photography and mobile imaging systems.

Abstract

We introduce BokehDiff, a novel lens blur rendering method that achieves physically accurate and visually appealing outcomes, with the help of generative diffusion prior. Previous methods are bounded by the accuracy of depth estimation, generating artifacts in depth discontinuities. Our method employs a physics-inspired self-attention module that aligns with the image formation process, incorporating depth-dependent circle of confusion constraint and self-occlusion effects. We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity. To address the lack of scalable paired data, we propose to synthesize photorealistic foregrounds with transparency with diffusion models, balancing authenticity and scene diversity.

Paper Structure

This paper contains 26 sections, 13 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: BokehDiff bridges the gap between physics and diffusion priors, and is able to synthesize photorealistic lens blur effects even when inaccurate depth estimation causes previous methods (BokehMe peng2022bokehme, MPIB peng2022mpib, and Dr. Bokeh sheng2024dr) to fail, especially at the depth discontinuities. The examples show previous methods over-blur the horse's tail, the person's hair, and the whiskers of the cat.
  • Figure 2: The framework of the proposed method. Given a paired synthetic data with disparity map, we optimize a LoRA of the U-Net and the encoder $\mathcal{E}$, while the decoder $\mathcal{D}$ remains frozen. A tailored PISA module (colored in green) is applied during downsampling, and is detailed in the right column, which is introduced in \ref{['sec:selfattention']}.
  • Figure 3: An illustration of the image formation model, and the three physics-related aspects considered in the PISA module.
  • Figure 4: The data synthesis pipeline. A pretrained text-to-image model is applied to generate foreground with transparency zhang2024transparent, and the large depth-of-field background is selected from real-world images. With the layers randomly placed with various facing angles and various depths, a classical ray-tracing method is applied to render the image with lens blur.
  • Figure 5: The qualitative comparisons of BokehDiff with BokehMe peng2022bokehme, MPIB peng2022mpib, and Dr. Bokeh sheng2024dr. Calculated from disparity, the defocus map is shared across the methods to be compared, and three patches are zoomed in for closer observation in each scene. Whiter region in the defocus map indicates more lens blur should be added, but is prone to error caused by depth estimation.
  • ...and 11 more figures