Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

Julia Werner; Christoph Gerum; Moritz Reiber; Jörg Nick; Oliver Bringmann

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

Julia Werner, Christoph Gerum, Moritz Reiber, Jörg Nick, Oliver Bringmann

TL;DR

This work tackles the problem of precise GI-tract localization in Video Capsule Endoscopy under strict energy constraints. It introduces a lightweight pipeline that combines a compact CNN classifier (MobileNetV3-Small) with a four-state HMM, using CNN outputs as emissions and Viterbi decoding to recover the most likely sequence of GI sections. The approach achieves a high accuracy of 98.04% on the RI Gastroenterology dataset with only about 1 million parameters, outperforming the CNN alone and enabling significant energy savings by delaying transmission until the capsule enters the small intestine. Practically, this enables on-device self-localization, potential increases in frame rate/resolution within the small intestine, and extended battery life for capsule systems, making precise localization feasible on low-power hardware.

Abstract

This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

TL;DR

Abstract

on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices

Paper Structure (7 sections, 2 equations, 4 figures, 2 tables)

This paper contains 7 sections, 2 equations, 4 figures, 2 tables.

Introduction
Related work
Methodology
Inference
HMM and Viterbi decoding
Results and Discussion
Conclusion

Figures (4)

Figure 1: Illustration of the presented approach (GI tract images from charoen2022rhode).
Figure 2: Delays and accuracies for different window sizes sliding over the log-likelihood matrix of the Viterbi decoding.
Figure 3: Confusion matrices of the CNN output (a) and the CNN+HMM combination (b) (classes: esophagus ($0$), stomach ($1$), small intestine ($2$) and colon ($3$)).
Figure 5: Accuracies of the CNN compared to the combinatorial approach CNN+HMM. Marked in red are two studies with worse results if the combination is used, more details can be found in Figure \ref{['fig:s193']} and Figure \ref{['fig:s182']}.

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

TL;DR

Abstract

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)