Table of Contents
Fetching ...

SHADeS: Self-supervised Monocular Depth Estimation Through Non-Lambertian Image Decomposition

Rema Daher, Francisco Vasconcelos, Danail Stoyanov

TL;DR

An effective self-supervised approach is proposed that uses this insight to jointly estimate light decomposition and depth in colonoscopy, and can simultaneously produce light decomposition and depth maps that are robust to specular regions.

Abstract

Purpose: Visual 3D scene reconstruction can support colonoscopy navigation. It can help in recognising which portions of the colon have been visualised and characterising the size and shape of polyps. This is still a very challenging problem due to complex illumination variations, including abundant specular reflections. We investigate how to effectively decouple light and depth in this problem. Methods: We introduce a self-supervised model that simultaneously characterises the shape and lighting of the visualised colonoscopy scene. Our model estimates shading, albedo, depth, and specularities (SHADeS) from single images. Unlike previous approaches (IID), we use a non-Lambertian model that treats specular reflections as a separate light component. The implementation of our method is available at https://github.com/RemaDaher/SHADeS. Results: We demonstrate on real colonoscopy images (Hyper Kvasir) that previous models for light decomposition (IID) and depth estimation (MonoVIT, ModoDepth2) are negatively affected by specularities. In contrast, SHADeS can simultaneously produce light decomposition and depth maps that are robust to specular regions. We also perform a quantitative comparison on phantom data (C3VD) where we further demonstrate the robustness of our model. Conclusion: Modelling specular reflections improves depth estimation in colonoscopy. We propose an effective self-supervised approach that uses this insight to jointly estimate light decomposition and depth. Light decomposition has the potential to help with other problems, such as place recognition within the colon.

SHADeS: Self-supervised Monocular Depth Estimation Through Non-Lambertian Image Decomposition

TL;DR

An effective self-supervised approach is proposed that uses this insight to jointly estimate light decomposition and depth in colonoscopy, and can simultaneously produce light decomposition and depth maps that are robust to specular regions.

Abstract

Purpose: Visual 3D scene reconstruction can support colonoscopy navigation. It can help in recognising which portions of the colon have been visualised and characterising the size and shape of polyps. This is still a very challenging problem due to complex illumination variations, including abundant specular reflections. We investigate how to effectively decouple light and depth in this problem. Methods: We introduce a self-supervised model that simultaneously characterises the shape and lighting of the visualised colonoscopy scene. Our model estimates shading, albedo, depth, and specularities (SHADeS) from single images. Unlike previous approaches (IID), we use a non-Lambertian model that treats specular reflections as a separate light component. The implementation of our method is available at https://github.com/RemaDaher/SHADeS. Results: We demonstrate on real colonoscopy images (Hyper Kvasir) that previous models for light decomposition (IID) and depth estimation (MonoVIT, ModoDepth2) are negatively affected by specularities. In contrast, SHADeS can simultaneously produce light decomposition and depth maps that are robust to specular regions. We also perform a quantitative comparison on phantom data (C3VD) where we further demonstrate the robustness of our model. Conclusion: Modelling specular reflections improves depth estimation in colonoscopy. We propose an effective self-supervised approach that uses this insight to jointly estimate light decomposition and depth. Light decomposition has the potential to help with other problems, such as place recognition within the colon.

Paper Structure

This paper contains 15 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Extracted albedo from a Lambertian (IID) versus our non-Lambertian model (SHADeS). The specular reflections produce significantly fewer artefacts with our model.
  • Figure 2: A high-level representation of depth estimation training. (a) The basic self-supervision relies on reconstructing a source image from the viewpoint of a target image ($I_{s \rightarrow t}$. (b) The system proposed in li2024image (IID) extends the basic approach with Lambertian decomposition ($\Phi_{Decompose} \rightarrow$ I=AS ), auto-masking ($\mu_2$), and a light adjustment network, $\Phi_{Adjust}$. (c) Our proposed system extends IID with non-Lambertian decomposition (I=AS+M) through a pre-trained inpainting network ($P_{Inp}$) and two auto-masking techniques ($\mu_1 \odot \mu_2$) without the need for an adjustment network.
  • Figure 3: Flowchart of the proposed system. During training we compare a reconstructed source image warped to target $AS_{s\to t}$ against an inpainted target with removed specularities ($I_{t,rem}$) through the loss $L_r$, while making sure the depth is smooth ($L_{es}$) and the decomposition is self-supervised through $L_d$ and $L_a$. At inference time albedo, shading, pose, and depth are estimated ($A, S, T, D$) and from those a reconstructed specular free image ($AS$) and a specular mask ($M$) are also generated. Our contributions are highlighted in orange.
  • Figure 4: Visual results of estimated shading, albedo, and depth on $Data_{real}$. For visual clarity, we clip the depth at 0.8.
  • Figure 5: Results on (row 1) $Data_{real}$ and (row 2) $Data_{phantom}$ showing estimated reconstructed images $AS$ and specularity masks $M$ versus their counterparts ($I_{rem}, M_{trad}$) from daher2023temporal.
  • ...and 1 more figures