Table of Contents
Fetching ...

MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

Zhiwei Wang, Ying Zhou, Shiquan He, Ting Li, Fan Huang, Qiang Ding, Xinxia Feng, Mei Liu, Qiang Li

TL;DR

This work tackles the problem of brightness-induced failures in self-supervised monocular depth estimation for endoscopic imagery. It introduces MonoPCC, a photometric-invariant cycle constraint that loops from the target image through a source and back, preserving brightness and improving training stability. Two mechanisms—Structure Transplant Module (STM) using FFT-based phase transfer and an Exponential Moving Average (EMA) stabilization—enable effective cycle warping without learning-based brightness calibration, while a perceptual loss further reinforces feature alignment. Across four endoscopic datasets and KITTI, MonoPCC achieves state-of-the-art depth and pose performance, demonstrates strong robustness to brightness fluctuations, and yields improved 3D reconstructions, with ablations confirming the critical roles of STM and EMA/L_pcp components.

Abstract

Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on four endoscopic datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%, 9.38%, 9.90% and 3.17%, respectively.

MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

TL;DR

This work tackles the problem of brightness-induced failures in self-supervised monocular depth estimation for endoscopic imagery. It introduces MonoPCC, a photometric-invariant cycle constraint that loops from the target image through a source and back, preserving brightness and improving training stability. Two mechanisms—Structure Transplant Module (STM) using FFT-based phase transfer and an Exponential Moving Average (EMA) stabilization—enable effective cycle warping without learning-based brightness calibration, while a perceptual loss further reinforces feature alignment. Across four endoscopic datasets and KITTI, MonoPCC achieves state-of-the-art depth and pose performance, demonstrates strong robustness to brightness fluctuations, and yields improved 3D reconstructions, with ablations confirming the critical roles of STM and EMA/L_pcp components.

Abstract

Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on four endoscopic datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%, 9.38%, 9.90% and 3.17%, respectively.
Paper Structure (28 sections, 9 equations, 12 figures, 7 tables)

This paper contains 28 sections, 9 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: (a)-(b) are the source $I_{s}$ and target $I_{t}$ frames. (c) is the warped image from the source to target. (d) is the cycle-warped image along the target-source-target path for reliable photometric constraint. Box contour colors distinguish different brightness patterns.
  • Figure 2: The training pipeline of MonoPCC, which consists of forward and backward cascaded warping paths bridged by two enabling techniques, i.e., structure transplant module (STM) and exponential moving average (EMA). The training has two phases, i.e., warm-up to initialize the network weights for reasonable forward warping, and follow-up to resist the brightness changes. Different box contour colors code different brightness patterns. $\copyright$ means concatenation.
  • Figure 3: Details of STM, which utilizes the phase-frequency of the source image $I_{s}$ to replace that of the warped image $I_{t \rightarrow s}$ to avoid image detail lost.
  • Figure 4: The auxiliary perception constraint by backward warping the encoding feature maps instead of raw images.
  • Figure 5: The Abs Rel error maps of comparison methods on SCARED and SimCol3D, with close-up details highlighted. The regions of interest (ROIs) are outlined with red dashed lines, and the Opencv Jet Colormap is used for visualization.
  • ...and 7 more figures