Table of Contents
Fetching ...

Automatic Image Unfolding and Stitching Framework for Esophageal Lining Video Based on Density-Weighted Feature Matching

Muyang Li, Juming Xiong, Ruining Deng, Tianyuan Yao, Regina N Tyree, Girish Hiremath, Yuankai Huo

TL;DR

A novel automatic image unfolding and stitching framework tailored for esophageal videos captured during endoscopy, which combines feature matching algorithms, including LoFTR, SIFT, and ORB, to create a feature filtering pool and employs a Density-Weighted Homography Optimization algorithm to enhance stitching accuracy.

Abstract

Endoscopy is a crucial tool for diagnosing the gastrointestinal tract, but its effectiveness is often limited by a narrow field of view and the dynamic nature of the internal environment, especially in the esophagus, where complex and repetitive patterns make image stitching challenging. This paper introduces a novel automatic image unfolding and stitching framework tailored for esophageal videos captured during endoscopy. The method combines feature matching algorithms, including LoFTR, SIFT, and ORB, to create a feature filtering pool and employs a Density-Weighted Homography Optimization (DWHO) algorithm to enhance stitching accuracy. By merging consecutive frames, the framework generates a detailed panoramic view of the esophagus, enabling thorough and accurate visual analysis. Experimental results show the framework achieves low Root Mean Square Error (RMSE) and high Structural Similarity Index (SSIM) across extensive video sequences, demonstrating its potential for clinical use and improving the quality and continuity of endoscopic visual data.

Automatic Image Unfolding and Stitching Framework for Esophageal Lining Video Based on Density-Weighted Feature Matching

TL;DR

A novel automatic image unfolding and stitching framework tailored for esophageal videos captured during endoscopy, which combines feature matching algorithms, including LoFTR, SIFT, and ORB, to create a feature filtering pool and employs a Density-Weighted Homography Optimization algorithm to enhance stitching accuracy.

Abstract

Endoscopy is a crucial tool for diagnosing the gastrointestinal tract, but its effectiveness is often limited by a narrow field of view and the dynamic nature of the internal environment, especially in the esophagus, where complex and repetitive patterns make image stitching challenging. This paper introduces a novel automatic image unfolding and stitching framework tailored for esophageal videos captured during endoscopy. The method combines feature matching algorithms, including LoFTR, SIFT, and ORB, to create a feature filtering pool and employs a Density-Weighted Homography Optimization (DWHO) algorithm to enhance stitching accuracy. By merging consecutive frames, the framework generates a detailed panoramic view of the esophagus, enabling thorough and accurate visual analysis. Experimental results show the framework achieves low Root Mean Square Error (RMSE) and high Structural Similarity Index (SSIM) across extensive video sequences, demonstrating its potential for clinical use and improving the quality and continuity of endoscopic visual data.
Paper Structure (12 sections, 1 equation, 3 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 1 equation, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Pipeline for esophageal image unfolding, matching, and stitching. The process includes three main steps: 1) transforming circular video views into unfolded images; 2) pooling and filtering feature points from deep-learning and traditional methods; 3) estimating horizontal and vertical displacements in overlapping regions to create a stitched representation of the esophageal inner surface.
  • Figure 2: Workflow of depth center location and unfolding process
  • Figure 3: Quantitative and Qualitative Comparison of Image Matching Methods. Top: Box plots comparing Structural Similarity Index (SSIM) and Root Mean Square Error (RMSE) across different methods (SIFT, ORB, LOFTR, DWHO). Bottom: Visualized stitched results for each method, highlighting the performance in generating coherent and accurate image reconstructions.