Table of Contents
Fetching ...

BokehFlow: Depth-Free Controllable Bokeh Rendering via Flow Matching

Yachuan Huang, Xianrui Luo, Qiwen Wang, Liao Shen, Jiaqi Li, Huiqiang Sun, Zihao Huang, Wei Jiang, Zhiguo Cao

TL;DR

This work tackles depth-free controllable bokeh rendering by removing the need for depth maps and enabling text-driven control. It introduces BokehFlow, which employs latent-space flow matching to directly transport all-in-focus images to bokeh outputs, augmented by a Bokeh Control Adapter that uses CLIP-guided prompts and cross-attention for focus and blur control. The approach strategically leverages pretrained priors to boost realism and efficiency, and demonstrates state-of-the-art performance on both depth-dependent and depth-free baselines across multiple datasets, with faster one-shot sampling. The result is a practical, semantically controllable bokeh renderer suitable for real-world photography pipelines, reducing dependence on accurate depth and enabling flexible user guidance.

Abstract

Bokeh rendering simulates the shallow depth-of-field effect in photography, enhancing visual aesthetics and guiding viewer attention to regions of interest. Although recent approaches perform well, rendering controllable bokeh without additional depth inputs remains a significant challenge. Existing classical and neural controllable methods rely on accurate depth maps, while generative approaches often struggle with limited controllability and efficiency. In this paper, we propose BokehFlow, a depth-free framework for controllable bokeh rendering based on flow matching. BokehFlow directly synthesizes photorealistic bokeh effects from all-in-focus images, eliminating the need for depth inputs. It employs a cross-attention mechanism to enable semantic control over both focus regions and blur intensity via text prompts. To support training and evaluation, we collect and synthesize four datasets. Extensive experiments demonstrate that BokehFlow achieves visually compelling bokeh effects and offers precise control, outperforming existing depth-dependent and generative methods in both rendering quality and efficiency.

BokehFlow: Depth-Free Controllable Bokeh Rendering via Flow Matching

TL;DR

This work tackles depth-free controllable bokeh rendering by removing the need for depth maps and enabling text-driven control. It introduces BokehFlow, which employs latent-space flow matching to directly transport all-in-focus images to bokeh outputs, augmented by a Bokeh Control Adapter that uses CLIP-guided prompts and cross-attention for focus and blur control. The approach strategically leverages pretrained priors to boost realism and efficiency, and demonstrates state-of-the-art performance on both depth-dependent and depth-free baselines across multiple datasets, with faster one-shot sampling. The result is a practical, semantically controllable bokeh renderer suitable for real-world photography pipelines, reducing dependence on accurate depth and enabling flexible user guidance.

Abstract

Bokeh rendering simulates the shallow depth-of-field effect in photography, enhancing visual aesthetics and guiding viewer attention to regions of interest. Although recent approaches perform well, rendering controllable bokeh without additional depth inputs remains a significant challenge. Existing classical and neural controllable methods rely on accurate depth maps, while generative approaches often struggle with limited controllability and efficiency. In this paper, we propose BokehFlow, a depth-free framework for controllable bokeh rendering based on flow matching. BokehFlow directly synthesizes photorealistic bokeh effects from all-in-focus images, eliminating the need for depth inputs. It employs a cross-attention mechanism to enable semantic control over both focus regions and blur intensity via text prompts. To support training and evaluation, we collect and synthesize four datasets. Extensive experiments demonstrate that BokehFlow achieves visually compelling bokeh effects and offers precise control, outperforming existing depth-dependent and generative methods in both rendering quality and efficiency.

Paper Structure

This paper contains 24 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: BokehFlow creates photorealistic and controllable bokeh effects from any resolution images without requiring depth maps. Our model achieves focus region control (top left), blur intensity control (bottom), and renders better edges around the focused object than iPhone (top right) in real-world scenes where depth tends to be unreliable. Zoom in for best view.
  • Figure 2: Pipeline of BokehFlow. The all-in-focus and bokeh images are first encoded into latent space using a VAE encoder. Random noise is added to the bokeh latent, and the flow matching model learns to denoise the concatenated all-in-focus and noisy bokeh latents through our direct transport design. Bokeh controls, which include the focus region and blur intensity, are encoded into control embeddings $z_C$ via a control encoder. In our proposed Bokeh Control Adapter (BCA), these features are injected through cross-attention, where bokeh features $z_B$ serve as queries and $z_C$ are used as keys and values. Finally, the denoised bokeh latent is decoded by the VAE decoder to generate the output bokeh image.
  • Figure 3: Visual comparison results on CBD dataset. Compared to depth-based methods, our method preserves sharper edges in the focus region and renders the most aesthetically pleasing bokeh effects.
  • Figure 4: Visual comparison with depth-free text-to-image method. Generative Photography generates a 5-frame video using original prompt and the first frame serves as our all-in-focus input. When adding "focus on the background" to the prompt, it fails to shift focus and alters scene content, while ours preserves the scene and produces appealing foreground bokeh.
  • Figure 5: Visual results on IB30 dataset. Our depth-free approach produces more accurate rendering.