Table of Contents
Fetching ...

FD-Vision Mamba for Endoscopic Exposure Correction

Zhuoran Zheng, Jun Zhang

TL;DR

This work tackles exposure correction in endoscopic imaging by introducing FDVM-Net, a frequency-domain network that reconstructs images from phase $P$ and amplitude $A$ through a dual-path architecture built from Convolution-augmented State Space Model blocks (C-SSM) and frequency-domain cross-attention. The method downscales internal representations for the SSM to maintain efficiency, processes phase and amplitude in separate branches, and fuses them before an inverse Fourier transform yields the corrected image, trained with an $L_1$ loss. Extensive experiments on a synthetic E-kvasri dataset and real images show that FDVM-Net achieves state-of-the-art PSNR/SSIM and practical speed-accuracy trade-offs, validating its effectiveness and generalization to arbitrary resolutions. The findings suggest FDVM-Net as a viable backbone for advanced medical image enhancement and potential extensions to other restoration tasks, with code available online.

Abstract

In endoscopic imaging, the recorded images are prone to exposure abnormalities, so maintaining high-quality images is important to assist healthcare professionals in performing decision-making. To overcome this issue, We design a frequency-domain based network, called FD-Vision Mamba (FDVM-Net), which achieves high-quality image exposure correction by reconstructing the frequency domain of endoscopic images. Specifically, inspired by the State Space Sequence Models (SSMs), we develop a C-SSM block that integrates the local feature extraction ability of the convolutional layer with the ability of the SSM to capture long-range dependencies. A two-path network is built using C-SSM as the basic function cell, and these two paths deal with the phase and amplitude information of the image, respectively. Finally, a degraded endoscopic image is reconstructed by FDVM-Net to obtain a high-quality clear image. Extensive experimental results demonstrate that our method achieves state-of-the-art results in terms of speed and accuracy, and it is noteworthy that our method can enhance endoscopic images of arbitrary resolution. The URL of the code is \url{https://github.com/zzr-idam/FDVM-Net}.

FD-Vision Mamba for Endoscopic Exposure Correction

TL;DR

This work tackles exposure correction in endoscopic imaging by introducing FDVM-Net, a frequency-domain network that reconstructs images from phase and amplitude through a dual-path architecture built from Convolution-augmented State Space Model blocks (C-SSM) and frequency-domain cross-attention. The method downscales internal representations for the SSM to maintain efficiency, processes phase and amplitude in separate branches, and fuses them before an inverse Fourier transform yields the corrected image, trained with an loss. Extensive experiments on a synthetic E-kvasri dataset and real images show that FDVM-Net achieves state-of-the-art PSNR/SSIM and practical speed-accuracy trade-offs, validating its effectiveness and generalization to arbitrary resolutions. The findings suggest FDVM-Net as a viable backbone for advanced medical image enhancement and potential extensions to other restoration tasks, with code available online.

Abstract

In endoscopic imaging, the recorded images are prone to exposure abnormalities, so maintaining high-quality images is important to assist healthcare professionals in performing decision-making. To overcome this issue, We design a frequency-domain based network, called FD-Vision Mamba (FDVM-Net), which achieves high-quality image exposure correction by reconstructing the frequency domain of endoscopic images. Specifically, inspired by the State Space Sequence Models (SSMs), we develop a C-SSM block that integrates the local feature extraction ability of the convolutional layer with the ability of the SSM to capture long-range dependencies. A two-path network is built using C-SSM as the basic function cell, and these two paths deal with the phase and amplitude information of the image, respectively. Finally, a degraded endoscopic image is reconstructed by FDVM-Net to obtain a high-quality clear image. Extensive experimental results demonstrate that our method achieves state-of-the-art results in terms of speed and accuracy, and it is noteworthy that our method can enhance endoscopic images of arbitrary resolution. The URL of the code is \url{https://github.com/zzr-idam/FDVM-Net}.
Paper Structure (9 sections, 3 figures, 2 tables)

This paper contains 9 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the DFVM-Net architecture. DFVM-Net is a two-path network with some C-SSM blocks in series with each path. In DFVM-Net block, we use convolution, SSM, cross-attention, and shortcut to form a basic cell. In this network, the upper branch deals with amplitude, the lower branch deals with phase, and finally the feature map is inverse Fourier transformed to yield a clear endoscopic image.
  • Figure 2: Exposure correction comparison on the E-kvasir. Our method outperforms other state-of-the-art techniques (LIMEguo2016lime, HDRNETgharbi2017deep, LECCMnsampi2021learning, SwinIRliang2021swinir, NAFNetchen2022simple and EndoIMLEwang2022endoscopic), demonstrating enhanced visual quality and detail restoration.
  • Figure 3: Comparison of exposure correction across individual modules of our method