Table of Contents
Fetching ...

Parameter-Free Neural Lens Blur Rendering for High-Fidelity Composites

Lingyan Ruan, Bin Chen, Taehyun Rhee

TL;DR

This paper addresses the challenge of realistically compositing virtual objects into real photographs without camera metadata or scene depth. It introduces Neural Lens, a pipeline that directly estimates a per-pixel Circle of Confusion (CoC) map from RGB images, derives a linear signed-CoC–disparity relationship within a projection mask, and renders realistic defocus via a neural reblurring network. The method yields high-fidelity, spatially varying blur, demonstrated with real and synthetic data, and outperforms baselines in both quantitative metrics (e.g., PSNR/SSIM) and user studies. By eliminating the need for calibration data, this approach offers a practical, generalizable solution for DoF-consistent image and video compositing in mixed reality applications.

Abstract

Consistent and natural camera lens blur is important for seamlessly blending 3D virtual objects into photographed real-scenes. Since lens blur typically varies with scene depth, the placement of virtual objects and their corresponding blur levels significantly affect the visual fidelity of mixed reality compositions. Existing pipelines often rely on camera parameters (e.g., focal length, focus distance, aperture size) and scene depth to compute the circle of confusion (CoC) for realistic lens blur rendering. However, such information is often unavailable to ordinary users, limiting the accessibility and generalizability of these methods. In this work, we propose a novel compositing approach that directly estimates the CoC map from RGB images, bypassing the need for scene depth or camera metadata. The CoC values for virtual objects are inferred through a linear relationship between its signed CoC map and depth, and realistic lens blur is rendered using a neural reblurring network. Our method provides flexible and practical solution for real-world applications. Experimental results demonstrate that our method achieves high-fidelity compositing with realistic defocus effects, outperforming state-of-the-art techniques in both qualitative and quantitative evaluations.

Parameter-Free Neural Lens Blur Rendering for High-Fidelity Composites

TL;DR

This paper addresses the challenge of realistically compositing virtual objects into real photographs without camera metadata or scene depth. It introduces Neural Lens, a pipeline that directly estimates a per-pixel Circle of Confusion (CoC) map from RGB images, derives a linear signed-CoC–disparity relationship within a projection mask, and renders realistic defocus via a neural reblurring network. The method yields high-fidelity, spatially varying blur, demonstrated with real and synthetic data, and outperforms baselines in both quantitative metrics (e.g., PSNR/SSIM) and user studies. By eliminating the need for calibration data, this approach offers a practical, generalizable solution for DoF-consistent image and video compositing in mixed reality applications.

Abstract

Consistent and natural camera lens blur is important for seamlessly blending 3D virtual objects into photographed real-scenes. Since lens blur typically varies with scene depth, the placement of virtual objects and their corresponding blur levels significantly affect the visual fidelity of mixed reality compositions. Existing pipelines often rely on camera parameters (e.g., focal length, focus distance, aperture size) and scene depth to compute the circle of confusion (CoC) for realistic lens blur rendering. However, such information is often unavailable to ordinary users, limiting the accessibility and generalizability of these methods. In this work, we propose a novel compositing approach that directly estimates the CoC map from RGB images, bypassing the need for scene depth or camera metadata. The CoC values for virtual objects are inferred through a linear relationship between its signed CoC map and depth, and realistic lens blur is rendered using a neural reblurring network. Our method provides flexible and practical solution for real-world applications. Experimental results demonstrate that our method achieves high-fidelity compositing with realistic defocus effects, outperforming state-of-the-art techniques in both qualitative and quantitative evaluations.

Paper Structure

This paper contains 16 sections, 3 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of Circle of Confusion (CoC) Calculation. The CoC is computed based on camera metadata, including focal length, focus distance, and sensor size. The right column illustrates the relationships between the defocus map and depth, the defocus map and disparity (the inverse of depth), and the signed defocus map and disparity. Our method leverages the linear relationship between the signed defocus map and disparity.
  • Figure 2: Our approach comprises a CoC map estimation, the linear fitting between the signed CoC in the real world and the disparity of virtual objects, as well as a neural reblurring network. The linear fitting is only applicable to a dedicated region, as defined by the projection mask. The CoC of the entire object is later inferred based on the fitted function. We adopt off-the-shelf methods for both CoC estimation and reblurring. Once the linear mapping is obtained, it can be readily applied to other virtual objects in the scene.
  • Figure 3: The projection mask of our approach. Left: we cast shadow along $\mathbf{p}$, where $\mathbf{p}\parallel -y$ and $\mathbf{p}\bot \mathbf{v}$, $\mathbf{v}$ the look-at vector of camera along $z$ direction. Right (camera view): (a) original object image. (b) cast shadow along $\mathbf{p}$. (c) original object mask. (d) our corrected projection mask.
  • Figure 4: Please zoom in to better observe the compositing quality. Comparison between our method and Blind Augmentation prakash2025blind with DoF effect. The first column shows the background photographs to be augmented. The second column presents the estimated CoC maps from prakash2025blind, while the third shows the CoC maps used for reblurring virtual objects based on our proposed fitting scheme. The fourth and fifth columns display our compositing results and those from Blind Augmentation, respectively. The top two examples are real photographs, and the bottom two are rendered scenes with available ground truth, denoted as GT. Our method demonstrates superior performance, particularly around object boundaries (e.g., airplane, horse, bowl) and in maintaining consistent blur levels (e.g., horse, bowl, keyboard). See Sec. \ref{['subsec:comparison']} for detailed discussion.
  • Figure 5: Results of object insertion under different defocus levels: focusing on the front (croissant), the middle (baguette), and the back (coffee plunger). The first row shows the original background images prepared for compositing. The second row presents our reblurred results based on the inferred CoC maps. The bottom row illustrates our fitted relationship between the background signed CoC and the disparity of the composited object (baguette in this case). Our method produces fine-grained and accurate CoC maps for the inserted object, enabling realistic and natural compositing. Darker regions indicate areas in focus, while brighter regions represent areas that are increasingly out of focus.
  • ...and 5 more figures