Table of Contents
Fetching ...

MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation

Xinyu Liu, Guolei Sun, Cheng Wang, Yixuan Yuan, Ender Konukoglu

TL;DR

MedVSR tackles the problem of reconstructing high-quality medical videos from LR sequences affected by motion and abrupt frame changes. It introduces Cross State-Space Propagation to use distant frames as control signals within state-space blocks and Inner State-Space Reconstruction to jointly learn long-range spatial features with large-kernel aggregation. Across four medical datasets (endoscopy and cataract surgery), MedVSR achieves state-of-the-art reconstruction quality and efficiency, outperforming existing VSR methods while maintaining lower computational cost. The approach yields sharper textures and better preservation of delicate tissue details, supporting improved diagnostic reliability and tool visualization in clinical workflows.

Abstract

High-resolution (HR) medical videos are vital for accurate diagnosis, yet are hard to acquire due to hardware limitations and physiological constraints. Clinically, the collected low-resolution (LR) medical videos present unique challenges for video super-resolution (VSR) models, including camera shake, noise, and abrupt frame transitions, which result in significant optical flow errors and alignment difficulties. Additionally, tissues and organs exhibit continuous and nuanced structures, but current VSR models are prone to introducing artifacts and distorted features that can mislead doctors. To this end, we propose MedVSR, a tailored framework for medical VSR. It first employs Cross State-Space Propagation (CSSP) to address the imprecise alignment by projecting distant frames as control matrices within state-space models, enabling the selective propagation of consistent and informative features to neighboring frames for effective alignment. Moreover, we design an Inner State-Space Reconstruction (ISSR) module that enhances tissue structures and reduces artifacts with joint long-range spatial feature learning and large-kernel short-range information aggregation. Experiments across four datasets in diverse medical scenarios, including endoscopy and cataract surgeries, show that MedVSR significantly outperforms existing VSR models in reconstruction performance and efficiency. Code released at https://github.com/CUHK-AIM-Group/MedVSR.

MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation

TL;DR

MedVSR tackles the problem of reconstructing high-quality medical videos from LR sequences affected by motion and abrupt frame changes. It introduces Cross State-Space Propagation to use distant frames as control signals within state-space blocks and Inner State-Space Reconstruction to jointly learn long-range spatial features with large-kernel aggregation. Across four medical datasets (endoscopy and cataract surgery), MedVSR achieves state-of-the-art reconstruction quality and efficiency, outperforming existing VSR methods while maintaining lower computational cost. The approach yields sharper textures and better preservation of delicate tissue details, supporting improved diagnostic reliability and tool visualization in clinical workflows.

Abstract

High-resolution (HR) medical videos are vital for accurate diagnosis, yet are hard to acquire due to hardware limitations and physiological constraints. Clinically, the collected low-resolution (LR) medical videos present unique challenges for video super-resolution (VSR) models, including camera shake, noise, and abrupt frame transitions, which result in significant optical flow errors and alignment difficulties. Additionally, tissues and organs exhibit continuous and nuanced structures, but current VSR models are prone to introducing artifacts and distorted features that can mislead doctors. To this end, we propose MedVSR, a tailored framework for medical VSR. It first employs Cross State-Space Propagation (CSSP) to address the imprecise alignment by projecting distant frames as control matrices within state-space models, enabling the selective propagation of consistent and informative features to neighboring frames for effective alignment. Moreover, we design an Inner State-Space Reconstruction (ISSR) module that enhances tissue structures and reduces artifacts with joint long-range spatial feature learning and large-kernel short-range information aggregation. Experiments across four datasets in diverse medical scenarios, including endoscopy and cataract surgeries, show that MedVSR significantly outperforms existing VSR models in reconstruction performance and efficiency. Code released at https://github.com/CUHK-AIM-Group/MedVSR.

Paper Structure

This paper contains 15 sections, 9 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: (a) Example of sharp transitions and jitter in medical videos, especially for distant frames (e.g. frame $t\hbox{[}1.0]{ - }2\hbox{[}1.0]{$ → $} t$), which pose significant challenges for existing alignment methods. (b) We measure the averaged error of forward and inverse backward optical flows in different datasets, lower error values denote better estimation stability. Medical videos tend to have significantly larger errors, making the alignment more challenging.
  • Figure 2: Examples of texture removal and shape distortion of existing VSR method chan2021basicvsr, which could not accurately reflect the GT and can be misleading for doctors. MedVSR reconstructs real features and produces consistent results as GT. Zoom in for details.
  • Figure 3: Illustration of the proposed MedVSR framework. For clarity, the main stream shows the $j$-th propagation branch. MedVSR includes two core operations, which are CSSP that captures consistent features for enhancing propagation, and ISSR that reconstructs smooth frames with long-range spatial feature learning and short-range information aggregation.
  • Figure 4: Illustration of the CSSP. It propagates distant frame features to neighbor via the SSM in cross state-space block and aligns with the deformable convolution block.
  • Figure 5: Qualitative comparisons on HyperKvasir Borgli2020HyperKvasir, LDPolyp ma2021ldpolypvideo, and EndoVis18 allan2020endovis18. MedVSR reduces artifacts, provides detailed results, and reconstructs the textures accurately. For more qualitative comparisons, please refer to supplementary. Zoom in for details.
  • ...and 2 more figures