Efficient Depth- and Spatially-Varying Image Simulation for Defocus Deblur
Xinge Yang, Chuong Nguyen, Wenbin Wang, Kaizhang Kang, Wolfgang Heidrich, Xiaoxing Li
TL;DR
This work tackles defocus and aberration in fixed-focus, large-aperture cameras by proposing an efficient, depth- and spatially varying synthetic data pipeline. It unprocesses RGB to RAW, models depth-dependent PSFs with a discretized depth set $Z$, and uses PSF interpolation to avoid per-pixel convolutions, augmented by ISO and radial position auxiliary channels; pseudo-depth from DepthAnythingV2 is scaled to enrich RGB datasets. A simple network (NAFNet) trained on low-resolution synthetic data generalizes to high-resolution real images and outperforms baselines while significantly reducing rendering time and memory compared to full optical simulations. The approach enables practical applications such as improved OCR for near-field text and higher-fidelity 3D asset reconstruction, and it offers a scalable path for computational photography in smart glasses and similar devices. Overall, depth-variant synthetic data provides robust, efficient defocus restoration with broad real-world impact.
Abstract
Modern cameras with large apertures often suffer from a shallow depth of field, resulting in blurry images of objects outside the focal plane. This limitation is particularly problematic for fixed-focus cameras, such as those used in smart glasses, where adding autofocus mechanisms is challenging due to form factor and power constraints. Due to unmatched optical aberrations and defocus properties unique to each camera system, deep learning models trained on existing open-source datasets often face domain gaps and do not perform well in real-world settings. In this paper, we propose an efficient and scalable dataset synthesis approach that does not rely on fine-tuning with real-world data. Our method simultaneously models depth-dependent defocus and spatially varying optical aberrations, addressing both computational complexity and the scarcity of high-quality RGB-D datasets. Experimental results demonstrate that a network trained on our low resolution synthetic images generalizes effectively to high resolution (12MP) real-world images across diverse scenes.
