Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs
Julia Werner, Christoph Gerum, Moritz Reiber, Jörg Nick, Oliver Bringmann
TL;DR
This work tackles the problem of precise GI-tract localization in Video Capsule Endoscopy under strict energy constraints. It introduces a lightweight pipeline that combines a compact CNN classifier (MobileNetV3-Small) with a four-state HMM, using CNN outputs as emissions and Viterbi decoding to recover the most likely sequence of GI sections. The approach achieves a high accuracy of 98.04% on the RI Gastroenterology dataset with only about 1 million parameters, outperforming the CNN alone and enabling significant energy savings by delaying transmission until the capsule enters the small intestine. Practically, this enables on-device self-localization, potential increases in frame rate/resolution within the small intestine, and extended battery life for capsule systems, making precise localization feasible on low-power hardware.
Abstract
This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices
