BokehFlow: Depth-Free Controllable Bokeh Rendering via Flow Matching
Yachuan Huang, Xianrui Luo, Qiwen Wang, Liao Shen, Jiaqi Li, Huiqiang Sun, Zihao Huang, Wei Jiang, Zhiguo Cao
TL;DR
This work tackles depth-free controllable bokeh rendering by removing the need for depth maps and enabling text-driven control. It introduces BokehFlow, which employs latent-space flow matching to directly transport all-in-focus images to bokeh outputs, augmented by a Bokeh Control Adapter that uses CLIP-guided prompts and cross-attention for focus and blur control. The approach strategically leverages pretrained priors to boost realism and efficiency, and demonstrates state-of-the-art performance on both depth-dependent and depth-free baselines across multiple datasets, with faster one-shot sampling. The result is a practical, semantically controllable bokeh renderer suitable for real-world photography pipelines, reducing dependence on accurate depth and enabling flexible user guidance.
Abstract
Bokeh rendering simulates the shallow depth-of-field effect in photography, enhancing visual aesthetics and guiding viewer attention to regions of interest. Although recent approaches perform well, rendering controllable bokeh without additional depth inputs remains a significant challenge. Existing classical and neural controllable methods rely on accurate depth maps, while generative approaches often struggle with limited controllability and efficiency. In this paper, we propose BokehFlow, a depth-free framework for controllable bokeh rendering based on flow matching. BokehFlow directly synthesizes photorealistic bokeh effects from all-in-focus images, eliminating the need for depth inputs. It employs a cross-attention mechanism to enable semantic control over both focus regions and blur intensity via text prompts. To support training and evaluation, we collect and synthesize four datasets. Extensive experiments demonstrate that BokehFlow achieves visually compelling bokeh effects and offers precise control, outperforming existing depth-dependent and generative methods in both rendering quality and efficiency.
