Expanding Training Data for Endoscopic Phenotyping of Eosinophilic Esophagitis
Juming Xiong, Hou Xiong, Quan Liu, Ruining Deng, Regina N Tyree, Girish Hiremath, Yuankai Huo
TL;DR
This work tackles data scarcity in AI-assisted EoE endoscopy by expanding the training set from 435 to 7050 images through mining online sources and textbooks, enabling a data-efficient classification of EoE phenotypes. It employs a DeiT (Vision Transformer) architecture with a distillation token and Gradient Attention Rollout to achieve accurate, interpretable multi-label classification across six EoE and five non-EoE classes, leveraging both site-specific pediatric data and web-derived images. Key contributions include a diverse, publicly enhanced dataset, an end-to-end transformer-based pipeline, and attention-based visualizations that align with clinically relevant features (e.g., edema, exudates, rings), resulting in improved diagnostic metrics and robustness. The approach holds potential to reduce invasive biopsies by providing scalable, generalizable endoscopic phenotyping across varied clinical contexts.
Abstract
Eosinophilic esophagitis (EoE) is a chronic esophageal disorder marked by eosinophil-dominated inflammation. Diagnosing EoE usually involves endoscopic inspection of the esophageal mucosa and obtaining esophageal biopsies for histologic confirmation. Recent advances have seen AI-assisted endoscopic imaging, guided by the EREFS system, emerge as a potential alternative to reduce reliance on invasive histological assessments. Despite these advancements, significant challenges persist due to the limited availability of data for training AI models - a common issue even in the development of AI for more prevalent diseases. This study seeks to improve the performance of deep learning-based EoE phenotype classification by augmenting our training data with a diverse set of images from online platforms, public datasets, and electronic textbooks increasing our dataset from 435 to 7050 images. We utilized the Data-efficient Image Transformer for image classification and incorporated attention map visualizations to boost interpretability. The findings show that our expanded dataset and model enhancements improved diagnostic accuracy, robustness, and comprehensive analysis, enhancing patient outcomes.
