Table of Contents
Fetching ...

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

Julia Werner, Christoph Gerum, Moritz Reiber, Jörg Nick, Oliver Bringmann

TL;DR

This work tackles the problem of precise GI-tract localization in Video Capsule Endoscopy under strict energy constraints. It introduces a lightweight pipeline that combines a compact CNN classifier (MobileNetV3-Small) with a four-state HMM, using CNN outputs as emissions and Viterbi decoding to recover the most likely sequence of GI sections. The approach achieves a high accuracy of 98.04% on the RI Gastroenterology dataset with only about 1 million parameters, outperforming the CNN alone and enabling significant energy savings by delaying transmission until the capsule enters the small intestine. Practically, this enables on-device self-localization, potential increases in frame rate/resolution within the small intestine, and extended battery life for capsule systems, making precise localization feasible on low-power hardware.

Abstract

This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

TL;DR

This work tackles the problem of precise GI-tract localization in Video Capsule Endoscopy under strict energy constraints. It introduces a lightweight pipeline that combines a compact CNN classifier (MobileNetV3-Small) with a four-state HMM, using CNN outputs as emissions and Viterbi decoding to recover the most likely sequence of GI sections. The approach achieves a high accuracy of 98.04% on the RI Gastroenterology dataset with only about 1 million parameters, outperforming the CNN alone and enabling significant energy savings by delaying transmission until the capsule enters the small intestine. Practically, this enables on-device self-localization, potential increases in frame rate/resolution within the small intestine, and extended battery life for capsule systems, making precise localization feasible on low-power hardware.

Abstract

This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices
Paper Structure (7 sections, 2 equations, 4 figures, 2 tables)

This paper contains 7 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of the presented approach (GI tract images from charoen2022rhode).
  • Figure 2: Delays and accuracies for different window sizes sliding over the log-likelihood matrix of the Viterbi decoding.
  • Figure 3: Confusion matrices of the CNN output (a) and the CNN+HMM combination (b) (classes: esophagus ($0$), stomach ($1$), small intestine ($2$) and colon ($3$)).
  • Figure 5: Accuracies of the CNN compared to the combinatorial approach CNN+HMM. Marked in red are two studies with worse results if the combination is used, more details can be found in Figure \ref{['fig:s193']} and Figure \ref{['fig:s182']}.