A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Noha Ghatwary; Jiangbei Yue; Ahmed Elgendy; Hanna Nagdy; Ahmed Galal; Hayam Fathy; Hussein El-Amin; Venkataraman Subramanian; Noor Mohammed; Gilberto Ochoa-Ruiz; Sharib Ali

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Noha Ghatwary, Jiangbei Yue, Ahmed Elgendy, Hanna Nagdy, Ahmed Galal, Hayam Fathy, Hussein El-Amin, Venkataraman Subramanian, Noor Mohammed, Gilberto Ochoa-Ruiz, Sharib Ali

Abstract

Ulcerative colitis (UC) is a chronic mucosal inflammatory condition that places patients at increased risk of colorectal cancer. Colonoscopic surveillance remains the gold standard for assessing disease activity, and reporting typically relies on standardised endoscopic scoring metrics. The most widely used is the Mayo Endoscopic Score (MES), with some centres also adopting the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). Both are descriptive assessments of mucosal inflammation (MES: 0 to 3; UCEIS: 0 to 8), where higher values indicate more severe disease. However, computational methods for automatically predicting these scores remain limited, largely due to the lack of publicly available expert-annotated datasets and the absence of robust benchmarking. There is also a significant research gap in generating clinically meaningful descriptions of UC images, despite image captioning being a well-established computer vision task. Variability in endoscopic systems and procedural workflows across centres further highlights the need for multi-centre datasets to ensure algorithmic robustness and generalisability. In this work, we introduce a curated multi-centre, multi-resolution dataset that includes expert-validated MES and UCEIS labels, alongside detailed clinical descriptions. To our knowledge, this is the first comprehensive dataset that combines dual scoring metrics for classification tasks with expert-generated captions describing mucosal appearance and clinically accepted reasoning for image captioning. This resource opens new opportunities for developing clinically meaningful multimodal algorithms. In addition to the dataset, we also provide benchmarking using convolutional neural networks, vision transformers, hybrid models, and widely used multimodal vision-language captioning algorithms.

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Abstract

Paper Structure (7 sections, 4 figures, 4 tables)

This paper contains 7 sections, 4 figures, 4 tables.

Sub-set I:
Sub-set II:
Sub-set III:
Cosine similarity
BLEU (Bilingual Evaluation Understudy)
ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
METEOR (Metric for Evaluation of Translation with Explicit ORdering)

Figures (4)

Figure 1: Distribution of MES and UCEIS scores categories across three Subsets for the full dataset. The inner ring represents the MES-score distribution while the outer ring represents the UCEIS distribution each with corresponding percentages. The total number of images per subset is shown in the center of the chart
Figure 2: Representative sample for different MES score across the three subsets. Rows presents the MES Scores (MES-0, MES-1, MES-2 and MES-3) while columns presents the three Subsets consecutively
Figure 3: Overview of annotation pipeline, illustrating features selected by gastroenterologist and the generation of descriptive caption.
Figure 4: Sample images from the dataset with captions presenting different grades of MES score with varied properties

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Abstract

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Authors

Abstract

Table of Contents

Figures (4)