Table of Contents
Fetching ...

DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement

Handing Xu, Zhenguo Nie, Tairan Peng, Huimin Pan, Xin-Jun Liu

TL;DR

This work tackles the challenge of real-time endoscopic video enhancement by introducing a degradation-guided framework that explicitly models and propagates degradation representations across frames. It combines a degradation-aware module (DAM), a degradation-guided enhancement module (DGEM), and a degradation representation propagation module (DRPM) with cycle-consistency training to achieve high-quality restoration while maintaining real-time performance. Key contributions include a two-stage training strategy that leverages artificial degradations for pretraining and real unpaired data for adaptation, and a temporal degradation propagation mechanism that reduces computation without sacrificing coherence. The approach demonstrates strong performance on both synthetic degradations (SCARED) and real surgical data (SES), highlighting the practicality of degradation-aware modeling for clinical endoscopic video enhancement.

Abstract

Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, tissue scattering, occlusions, and motion blur, which obscure critical anatomical details and complicate surgical manipulation. Although deep learning-based methods have shown promise in image enhancement, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose a degradation-aware framework for endoscopic video enhancement, which enables real-time, high-quality enhancement by propagating degradation representations across frames. In our framework, degradation representations are first extracted from images using contrastive learning. We then introduce a fusion mechanism that modulates image features with these representations to guide a single-frame enhancement model, which is trained with a cycle-consistency constraint between degraded and restored images to improve robustness and generalization. Experiments demonstrate that our framework achieves a superior balance between performance and efficiency compared with several state-of-the-art methods. These results highlight the effectiveness of degradation-aware modeling for real-time endoscopic video enhancement. Nevertheless, our method suggests that implicitly learning and propagating degradation representation offer a practical pathway for clinical application.

DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement

TL;DR

This work tackles the challenge of real-time endoscopic video enhancement by introducing a degradation-guided framework that explicitly models and propagates degradation representations across frames. It combines a degradation-aware module (DAM), a degradation-guided enhancement module (DGEM), and a degradation representation propagation module (DRPM) with cycle-consistency training to achieve high-quality restoration while maintaining real-time performance. Key contributions include a two-stage training strategy that leverages artificial degradations for pretraining and real unpaired data for adaptation, and a temporal degradation propagation mechanism that reduces computation without sacrificing coherence. The approach demonstrates strong performance on both synthetic degradations (SCARED) and real surgical data (SES), highlighting the practicality of degradation-aware modeling for clinical endoscopic video enhancement.

Abstract

Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, tissue scattering, occlusions, and motion blur, which obscure critical anatomical details and complicate surgical manipulation. Although deep learning-based methods have shown promise in image enhancement, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose a degradation-aware framework for endoscopic video enhancement, which enables real-time, high-quality enhancement by propagating degradation representations across frames. In our framework, degradation representations are first extracted from images using contrastive learning. We then introduce a fusion mechanism that modulates image features with these representations to guide a single-frame enhancement model, which is trained with a cycle-consistency constraint between degraded and restored images to improve robustness and generalization. Experiments demonstrate that our framework achieves a superior balance between performance and efficiency compared with several state-of-the-art methods. These results highlight the effectiveness of degradation-aware modeling for real-time endoscopic video enhancement. Nevertheless, our method suggests that implicitly learning and propagating degradation representation offer a practical pathway for clinical application.

Paper Structure

This paper contains 26 sections, 6 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Framework of the real-time video enhancement. Each frame of the input video sequence undergoes degradation representation modeling and enhancement generation. Specifically, for frames whose index is a multiple of $T_\Delta$, the system invokes the degradation-aware module (DAM) to perform a full estimation, yielding high-precision degradation features. For frames that are not multiples of $T_\Delta$, the degradation representation is rapidly propagated through the degradation representation propagation module (DRPM) along the temporal dimension and subsequently fed into the single frame enhancement model to enhance the current frame. All the modules are detailed in the following sections.
  • Figure 2: The diagram of the DAM. (a) The DAM training workflow. (b) The structure of the DAM.
  • Figure 3: Degradation guided enhancement module (DGEM). (a) The basic architecture of the DGEM, which includes four parts: Degradation representation compression, shallow feature extraction, feature modulation, and Reconstruction. (b) The degradation compression block, which implements both channel attention and spatial attention mechanisms. (c) The Degradation-aware Swin Transformer Block, which injects the degradation representation $d_c$ into the value component and modulates the input feature based on multi-head self-attention.
  • Figure 4: Diagram of the cyclical consistency. (a) The dataflow of the whole model. $\mathcal{L} \rightarrow \mathcal{H}$: achieved by the DAM and the DGEM with an additional output $d_c$. $\mathcal{H} \rightarrow \mathcal{L}$: achieved based on PDMs and the regression of $d_p$. (b) Calculation of the cyclical consistency loss. Solid lines indicate the dataflow when generating reconstructed synthetic images, and dashed lines indicate the cyclical consistency loss calculation at both ends.
  • Figure 5: The architecture of the DRPM.
  • ...and 3 more figures