Table of Contents
Fetching ...

Generative Refocusing: Flexible Defocus Control from a Single Image

Chun-Wei Tuan Mu, Jia-Bin Huang, Yu-Lun Liu

TL;DR

This work tackles single-image refocusing by decoupling defocus deblurring from bokeh synthesis in a two-stage diffusion framework. DeblurNet recovers an all-in-focus image, while BokehNet renders controllable shallow depth-of-field with user-specified focus planes and aperture shapes. A semi-supervised training strategy combines synthetic paired data with real unpaired bokeh images and EXIF metadata to capture authentic optical characteristics beyond simulators. The approach yields state-of-the-art results in defocus deblurring, bokeh synthesis, and refocusing benchmarks, and extends to text-guided editing and custom aperture shapes, enabling flexible post-capture photography with photorealistic optics.

Abstract

Depth-of-field control is essential in photography, but getting the perfect focus often takes several tries or special equipment. Single-image refocusing is still difficult. It involves recovering sharp content and creating realistic bokeh. Current methods have significant drawbacks. They need all-in-focus inputs, depend on synthetic data from simulators, and have limited control over aperture. We introduce Generative Refocusing, a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh. Our main innovation is semi-supervised training. This method combines synthetic paired data with unpaired real bokeh images, using EXIF metadata to capture real optical characteristics beyond what simulators can provide. Our experiments show we achieve top performance in defocus deblurring, bokeh synthesis, and refocusing benchmarks. Additionally, our Generative Refocusing allows text-guided adjustments and custom aperture shapes.

Generative Refocusing: Flexible Defocus Control from a Single Image

TL;DR

This work tackles single-image refocusing by decoupling defocus deblurring from bokeh synthesis in a two-stage diffusion framework. DeblurNet recovers an all-in-focus image, while BokehNet renders controllable shallow depth-of-field with user-specified focus planes and aperture shapes. A semi-supervised training strategy combines synthetic paired data with real unpaired bokeh images and EXIF metadata to capture authentic optical characteristics beyond simulators. The approach yields state-of-the-art results in defocus deblurring, bokeh synthesis, and refocusing benchmarks, and extends to text-guided editing and custom aperture shapes, enabling flexible post-capture photography with photorealistic optics.

Abstract

Depth-of-field control is essential in photography, but getting the perfect focus often takes several tries or special equipment. Single-image refocusing is still difficult. It involves recovering sharp content and creating realistic bokeh. Current methods have significant drawbacks. They need all-in-focus inputs, depend on synthetic data from simulators, and have limited control over aperture. We introduce Generative Refocusing, a two-step process that uses DeblurNet to recover all-in-focus images from various inputs and BokehNet for creating controllable bokeh. Our main innovation is semi-supervised training. This method combines synthetic paired data with unpaired real bokeh images, using EXIF metadata to capture real optical characteristics beyond what simulators can provide. Our experiments show we achieve top performance in defocus deblurring, bokeh synthesis, and refocusing benchmarks. Additionally, our Generative Refocusing allows text-guided adjustments and custom aperture shapes.

Paper Structure

This paper contains 41 sections, 3 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Generative refocusing with controllable optics. Our model turns a single input image into a virtual controllable camera, enabling diverse post-capture adjustments. (a) demonstrates aperture size control, allowing the user to vary the depth of field from strong bokeh to an all-in-focus image. (b) illustrates focus plane control by shifting the sharp region from the middle subject to the background. (c) highlights aperture shape control, synthesizing creative heart-shaped bokeh from point lights in the scene. (d) shows composite control where both the focus plane and aperture size are adjusted simultaneously to reframe the subject.
  • Figure 2: Pipeline Overview. Our method decomposes single-image refocusing into two stages: (a) Defocus Deblurring and (b) Shallow Depth-of-Field Synthesis. Given a blurry input image $I_{\text{in}}$, we optionally apply a pre-deblurring method DRBNet to obtain a conservative estimate $I_{\text{syn}}$, then feed both $I_{\text{in}}$ and $I_{\text{syn}}$ into DeblurNet to recover a high-quality all-in-focus image $I_{\text{aif}}$. The VAE encoder $\mathcal{E}$ and decoder $\mathcal{D}$ convert images to latent representations for processing by the DiT backbone. In the second stage, BokehNet takes the all-in-focus image $I_{\text{aif}}$, the defocus map $D_{\text{def}}$, and optionally a specific aperture shape as inputs to synthesize the refocused output. The defocus map $D_{\text{def}}$ is computed based on the estimated depth map $D$Bochkovskiy2025DepthPro, along with both the user-specified focus plane $S_1$ and bokeh level $K$.
  • Figure 3: Training data generation. Each training sample consists of the following five components: (i) a bokeh image, (ii) an all-in-focus (AIF) image, (iii) a depth map $D$, (iv) a bokeh level $K$, and (v) a focus plane $S_1$. We construct these samples via three routes: (a) Synthetic paired data. Given real AIF images and depth maps $D$, we compute a defocus map $D_{\text{def}}$ parameterized by a specified bokeh level $K$ and focus plane $S_1$, and feed it into a bokeh renderer BokehMe to synthesize corresponding bokeh images. (b) Real unpaired data. Given real bokeh images, DeblurNet recovers an AIF image. We then estimate depth and extract a foreground mask Zheng2024BiRefNet to define the focus plane $S_1$. The bokeh level $K$ is computed from the EXIF metadata following the formulation in Fortes2025BokehDiffusion. (c) Real paired data without EXIF. For real pairs lacking EXIF, we obtain a pseudo-AIF image and $S_1$ as in (b), and follow Eq. (2) to estimate the bokeh level $K$.
  • Figure 4: Qualitative comparison on defocus deblurring. Visual results on (a)RealDOFIFANet and (b)DPDDDPDD datasets. Blue boxes on the left indicate cropped regions shown in detail. Our DeblurNet faithfully recovers fine text details ("NEW YORK") in (a) where competing methods produce blurry or distorted results. In the challenging example (b) with severe defocus blur, other methods either fail to recover structure (AIFNet AIFNet, DRBNet DRBNet, INIKNet INIKNet) or introduce artifacts in the background (IFANet IFANet, Restormer Restormer), while our method reconstructs geometrically consistent, visually compelling content by leveraging diffusion priors guided by the pre-deblurred input.
  • Figure 5: Qualitative comparison on bokeh synthesis benchmark. Results on LF-Bokeh with zoomed patches (blue and orange boxes) highlighting detail quality. Our BokehNet synthesizes bokeh effects that better match ground truth with realistic blur gradients and natural occlusion handling. Baselines show various artifacts: BokehMe BokehMe exhibits simulator bias, Bokehlicious Bokehlicious over-smooths details, and BokehDiff BokehDiff produces inconsistent defocus. Our semi-supervised training on real bokeh images enables the capture of authentic lens characteristics.
  • ...and 6 more figures