Table of Contents
Fetching ...

Efficient Depth- and Spatially-Varying Image Simulation for Defocus Deblur

Xinge Yang, Chuong Nguyen, Wenbin Wang, Kaizhang Kang, Wolfgang Heidrich, Xiaoxing Li

TL;DR

This work tackles defocus and aberration in fixed-focus, large-aperture cameras by proposing an efficient, depth- and spatially varying synthetic data pipeline. It unprocesses RGB to RAW, models depth-dependent PSFs with a discretized depth set $Z$, and uses PSF interpolation to avoid per-pixel convolutions, augmented by ISO and radial position auxiliary channels; pseudo-depth from DepthAnythingV2 is scaled to enrich RGB datasets. A simple network (NAFNet) trained on low-resolution synthetic data generalizes to high-resolution real images and outperforms baselines while significantly reducing rendering time and memory compared to full optical simulations. The approach enables practical applications such as improved OCR for near-field text and higher-fidelity 3D asset reconstruction, and it offers a scalable path for computational photography in smart glasses and similar devices. Overall, depth-variant synthetic data provides robust, efficient defocus restoration with broad real-world impact.

Abstract

Modern cameras with large apertures often suffer from a shallow depth of field, resulting in blurry images of objects outside the focal plane. This limitation is particularly problematic for fixed-focus cameras, such as those used in smart glasses, where adding autofocus mechanisms is challenging due to form factor and power constraints. Due to unmatched optical aberrations and defocus properties unique to each camera system, deep learning models trained on existing open-source datasets often face domain gaps and do not perform well in real-world settings. In this paper, we propose an efficient and scalable dataset synthesis approach that does not rely on fine-tuning with real-world data. Our method simultaneously models depth-dependent defocus and spatially varying optical aberrations, addressing both computational complexity and the scarcity of high-quality RGB-D datasets. Experimental results demonstrate that a network trained on our low resolution synthetic images generalizes effectively to high resolution (12MP) real-world images across diverse scenes.

Efficient Depth- and Spatially-Varying Image Simulation for Defocus Deblur

TL;DR

This work tackles defocus and aberration in fixed-focus, large-aperture cameras by proposing an efficient, depth- and spatially varying synthetic data pipeline. It unprocesses RGB to RAW, models depth-dependent PSFs with a discretized depth set , and uses PSF interpolation to avoid per-pixel convolutions, augmented by ISO and radial position auxiliary channels; pseudo-depth from DepthAnythingV2 is scaled to enrich RGB datasets. A simple network (NAFNet) trained on low-resolution synthetic data generalizes to high-resolution real images and outperforms baselines while significantly reducing rendering time and memory compared to full optical simulations. The approach enables practical applications such as improved OCR for near-field text and higher-fidelity 3D asset reconstruction, and it offers a scalable path for computational photography in smart glasses and similar devices. Overall, depth-variant synthetic data provides robust, efficient defocus restoration with broad real-world impact.

Abstract

Modern cameras with large apertures often suffer from a shallow depth of field, resulting in blurry images of objects outside the focal plane. This limitation is particularly problematic for fixed-focus cameras, such as those used in smart glasses, where adding autofocus mechanisms is challenging due to form factor and power constraints. Due to unmatched optical aberrations and defocus properties unique to each camera system, deep learning models trained on existing open-source datasets often face domain gaps and do not perform well in real-world settings. In this paper, we propose an efficient and scalable dataset synthesis approach that does not rely on fine-tuning with real-world data. Our method simultaneously models depth-dependent defocus and spatially varying optical aberrations, addressing both computational complexity and the scarcity of high-quality RGB-D datasets. Experimental results demonstrate that a network trained on our low resolution synthetic images generalizes effectively to high resolution (12MP) real-world images across diverse scenes.

Paper Structure

This paper contains 15 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Motivation and real-world results for our proposed depth-varying dataset synthesis approach. Large aperture fixed-focus lenses struggle to capture clear images at short distances (typically $<$20 cm), creating challenges for devices like smart glasses to perceive the physical world. Our efficient depth-varying dataset synthesis approach enhances computational photography algorithms for real-world defocus scenes. Bottom row compares the raw captured image (left) with our restored result (right), demonstrating promising results in defocus deblurring, optical aberration correction, and noise reduction.
  • Figure 2: Training and inference pipeline of the proposed approach. Left: Images are unprocessed from RGB space to RAW space to simulate defocus blur, optical aberrations, sensor quantization, and noise. A pseudo depth map is predicted using the pretrained DepthAnythingV2 yang2024depth model, then randomly scaled and utilized in the depth-varying defocus and spatially-vary aberration simulation. Noise signal at a random ISO level is added to the blurry RAW image. The image data, ISO channel, and radial field map are then packaged as network inputs. Top Right: During the inference stage on real-world images, the ISO value is read from photograph metadata, and the field map is computed on full-resolution images. Bottom Right: Instead of relying on complicated network architectures, a simple network (NAFNet Chen_2022) is adopted for image reconstruction.
  • Figure 3: Comparison of synthetic training image with and without depth-varying simulation. Incorporating depth-varying defocus allows for more realistic simulations, reflecting real-world scenarios where objects at varying distances from the camera exhibit different levels of defocus.
  • Figure 4: Qualitative evaluation on 12MP real-world images with different defocus deblur methods and synthetic dataset generation. From left to right: the classical deblurring algorithm ("Polyblur")Delbracio_2021 exhibits limited capability, failing to produce high-quality images. The network ("LaDKNet")ruan2023revisiting, trained on an open dataset Abuolaim_2020, cannot be directly applied to our camera captures and produces artifacts due to inconsistent optics and sensor noise. Simulation in RGB image space results in images with structural artifacts due to inaccurate noise modeling. A network trained on synthetic datasets without depth-varying image simulation fails to deblur effectively, and also struggles with scenes that have varying depth ranges. For example, in the second case, the network is confused by the sharp wall and therefore fails to recover the teddy bear. Our proposed synthetic dataset generation helps the network implicitly learn to detect sharp and blurry regions, successfully recovering clear details such as the face of the teddy bear.
  • Figure 5: Performance improvement in OCR for scene understanding. Given an input image with texts captured at close distance (left), our depth of field extension successfully recovers details (right). Our result significantly improves OCR performance for both accuracy and detection rate. OCR results are generated with online program wei2024general, with errors marked in red.
  • ...and 1 more figures