Table of Contents
Fetching ...

LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

TL;DR

Endoscopic LLIE under low illumination poses a practical challenge for MIS due to both image quality and computational constraints. LighTDiff introduces an inconstant-resolution diffusion framework with a lightweight LighT backbone, a Temporal Light Unit for temporal context, and a Chroma Balancer for color fidelity, trained with SmoothL1 loss. It achieves competitive PSNR/SSIM while dramatically reducing parameters and increasing inference speed, outperforming multiple diffusion-based and GAN baselines on EndoVis17/18 and a real-world LLIE dataset, and improving downstream segmentation. This approach enables real-time, high-quality LLIE on consumer hardware, potentially enhancing visualization, augmentation, and navigation in surgical settings.

Abstract

Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency.

LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

TL;DR

Endoscopic LLIE under low illumination poses a practical challenge for MIS due to both image quality and computational constraints. LighTDiff introduces an inconstant-resolution diffusion framework with a lightweight LighT backbone, a Temporal Light Unit for temporal context, and a Chroma Balancer for color fidelity, trained with SmoothL1 loss. It achieves competitive PSNR/SSIM while dramatically reducing parameters and increasing inference speed, outperforming multiple diffusion-based and GAN baselines on EndoVis17/18 and a real-world LLIE dataset, and improving downstream segmentation. This approach enables real-time, high-quality LLIE on consumer hardware, potentially enhancing visualization, augmentation, and navigation in surgical settings.

Abstract

Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency.
Paper Structure (13 sections, 5 equations, 3 figures, 2 tables)

This paper contains 13 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Image comparison for normal lighting (a), low lighting (b), and LighTDiff reconstruction (c); Model comparison in performance and efficiency (d), and comparison of training costs (i.e., training hours for the same number of total iterations) (e).
  • Figure 2: The overview of our proposed LighTDiff. Panel a) illustrates the entire process, where the original image $y_0$ undergoes noise diffusion to generate $y_t$, and the model learns to reconstruct the original image from different time steps. The denoised output ${\tilde{y}_0}$ is further adapted by the Chroma Balancer (CB) to approach a natural distribution, resulting in ${\tilde{y}'}_0$. Panel (b) is LighTDiff architecture. Panel (c) illustrates the Temporal Light Block, with the details of Temporal Light Unit (TLU) in Panel (d). The network structure of CB is given in Panel (e).
  • Figure 3: The quantitative results for LighTDiff compared with SOTA approaches on EndoVis17 allan2019endovis17. The first row shows the enhanced images for different LLIE baselines, and the second row shows the reconstruction error heat maps. Blue to red indicates the error from small to large. Zoom to see the details.