Table of Contents
Fetching ...

EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering

John J. Han, Jie Ying Wu

TL;DR

EndoPBR tackles the lack of labeled surgical data by disentangling material properties and lighting in endoscopic scenes using a differentiable renderer. It combines a Disney BRDF model predicted by an MLP with a learnable moving spotlight to render photorealistic images from known geometry and poses, grounded by the rendering equation. The approach achieves competitive novel-view synthesis on the Colonoscopy 3D Video Dataset and demonstrates that synthetic EndoPBR data can effectively fine-tune depth estimation models with performance close to finetuning on real images. This work highlights synthetic data, physics-informed disentanglement, and differentiable rendering as promising avenues to advance MIS 3D vision tasks such as navigation, reconstruction, and digital twins.

Abstract

The lack of labeled datasets in 3D vision for surgical scenes inhibits the development of robust 3D reconstruction algorithms in the medical domain. Despite the popularity of Neural Radiance Fields and 3D Gaussian Splatting in the general computer vision community, these systems have yet to find consistent success in surgical scenes due to challenges such as non-stationary lighting and non-Lambertian surfaces. As a result, the need for labeled surgical datasets continues to grow. In this work, we introduce a differentiable rendering framework for material and lighting estimation from endoscopic images and known geometry. Compared to previous approaches that model lighting and material jointly as radiance, we explicitly disentangle these scene properties for robust and photorealistic novel view synthesis. To disambiguate the training process, we formulate domain-specific properties inherent in surgical scenes. Specifically, we model the scene lighting as a simple spotlight and material properties as a bidirectional reflectance distribution function, parameterized by a neural network. By grounding color predictions in the rendering equation, we can generate photorealistic images at arbitrary camera poses. We evaluate our method with various sequences from the Colonoscopy 3D Video Dataset and show that our method produces competitive novel view synthesis results compared with other approaches. Furthermore, we demonstrate that synthetic data can be used to develop 3D vision algorithms by finetuning a depth estimation model with our rendered outputs. Overall, we see that the depth estimation performance is on par with fine-tuning with the original real images.

EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering

TL;DR

EndoPBR tackles the lack of labeled surgical data by disentangling material properties and lighting in endoscopic scenes using a differentiable renderer. It combines a Disney BRDF model predicted by an MLP with a learnable moving spotlight to render photorealistic images from known geometry and poses, grounded by the rendering equation. The approach achieves competitive novel-view synthesis on the Colonoscopy 3D Video Dataset and demonstrates that synthetic EndoPBR data can effectively fine-tune depth estimation models with performance close to finetuning on real images. This work highlights synthetic data, physics-informed disentanglement, and differentiable rendering as promising avenues to advance MIS 3D vision tasks such as navigation, reconstruction, and digital twins.

Abstract

The lack of labeled datasets in 3D vision for surgical scenes inhibits the development of robust 3D reconstruction algorithms in the medical domain. Despite the popularity of Neural Radiance Fields and 3D Gaussian Splatting in the general computer vision community, these systems have yet to find consistent success in surgical scenes due to challenges such as non-stationary lighting and non-Lambertian surfaces. As a result, the need for labeled surgical datasets continues to grow. In this work, we introduce a differentiable rendering framework for material and lighting estimation from endoscopic images and known geometry. Compared to previous approaches that model lighting and material jointly as radiance, we explicitly disentangle these scene properties for robust and photorealistic novel view synthesis. To disambiguate the training process, we formulate domain-specific properties inherent in surgical scenes. Specifically, we model the scene lighting as a simple spotlight and material properties as a bidirectional reflectance distribution function, parameterized by a neural network. By grounding color predictions in the rendering equation, we can generate photorealistic images at arbitrary camera poses. We evaluate our method with various sequences from the Colonoscopy 3D Video Dataset and show that our method produces competitive novel view synthesis results compared with other approaches. Furthermore, we demonstrate that synthetic data can be used to develop 3D vision algorithms by finetuning a depth estimation model with our rendered outputs. Overall, we see that the depth estimation performance is on par with fine-tuning with the original real images.

Paper Structure

This paper contains 10 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: EndoPBR generates photorealistic renderings from posed images and known geometry. Top row contains generated renderings from views outside the training set, and bottom row displays the ground truth RGB images. Note that we undistort images prior to the model training.
  • Figure 2: A description of the components of our pipeline. (a) describes the mathematical notation for the geometry of our setup. Given the camera center $x_c$, light source center $x_L$, and a query pixel $p_i$, we calculate its corresponding 3D point $\boldsymbol{x}$, which has an associated surface normal $\hbox{\boldmath$\boldsymbol{n}$}$, outgoing vector $\boldsymbol{\omega_o}$ and light incoming vector $\boldsymbol{\omega_i}$. (b) displays the essential components of our network. The learnable spotlight model is used to calculate the incident light intensity at $\boldsymbol{x}$ (Sec. \ref{['sec:light']}), the BRDF model predicts material properties for $\boldsymbol{x}$ (Sec. \ref{['sec:brdf']}), and these estimations are combined to predict the final pixel value via the rendering equation (Sec. \ref{['sec:simplification']}).
  • Figure 3: Examples of synthetic data produced by EndoPBR to fine-tune Depth Anything V2. These images are generated by altering the camera view, material properties, or incident light intensity.