Table of Contents
Fetching ...

ColonScopeX: Leveraging Explainable Expert Systems with Multimodal Data for Improved Early Diagnosis of Colorectal Cancer

Natalia Sikora, Robert L. Manschke, Alethea M. Tang, Peter Dunstan, Dean A. Harris, Su Yang

TL;DR

Colorectal cancer remains a leading cause of cancer mortality, with early detection dramatically improving survival. This paper introduces ColonScopeX, a multimodal explainable AI framework that fuses Raman serum spectra with detailed patient metadata to detect polyps and early-stage CRC, delivering clinician-friendly text outputs via an explainable AI pipeline. Across early, joint, and late fusion architectures, the system achieves strong performance for CRC detection (notably near-perfect discrimination in CRC-trained models) while maintaining interpretable explanations through SHAP and LIME; polyp detection also shows robust results. While offering a noninvasive screening alternative with potential for population-level impact, the work candidly discusses limitations such as dataset bias, generalisability, and the need for broader clinical validation and ethical considerations.

Abstract

Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths and the third most prevalent malignant tumour worldwide. Early detection of CRC remains problematic due to its non-specific and often embarrassing symptoms, which patients frequently overlook or hesitate to report to clinicians. Crucially, the stage at which CRC is diagnosed significantly impacts survivability, with a survival rate of 80-95\% for Stage I and a stark decline to 10\% for Stage IV. Unfortunately, in the UK, only 14.4\% of cases are diagnosed at the earliest stage (Stage I). In this study, we propose ColonScopeX, a machine learning framework utilizing explainable AI (XAI) methodologies to enhance the early detection of CRC and pre-cancerous lesions. Our approach employs a multimodal model that integrates signals from blood sample measurements, processed using the Savitzky-Golay algorithm for fingerprint smoothing, alongside comprehensive patient metadata, including medication history, comorbidities, age, weight, and BMI. By leveraging XAI techniques, we aim to render the model's decision-making process transparent and interpretable, thereby fostering greater trust and understanding in its predictions. The proposed framework could be utilised as a triage tool or a screening tool of the general population. This research highlights the potential of combining diverse patient data sources and explainable machine learning to tackle critical challenges in medical diagnostics.

ColonScopeX: Leveraging Explainable Expert Systems with Multimodal Data for Improved Early Diagnosis of Colorectal Cancer

TL;DR

Colorectal cancer remains a leading cause of cancer mortality, with early detection dramatically improving survival. This paper introduces ColonScopeX, a multimodal explainable AI framework that fuses Raman serum spectra with detailed patient metadata to detect polyps and early-stage CRC, delivering clinician-friendly text outputs via an explainable AI pipeline. Across early, joint, and late fusion architectures, the system achieves strong performance for CRC detection (notably near-perfect discrimination in CRC-trained models) while maintaining interpretable explanations through SHAP and LIME; polyp detection also shows robust results. While offering a noninvasive screening alternative with potential for population-level impact, the work candidly discusses limitations such as dataset bias, generalisability, and the need for broader clinical validation and ethical considerations.

Abstract

Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths and the third most prevalent malignant tumour worldwide. Early detection of CRC remains problematic due to its non-specific and often embarrassing symptoms, which patients frequently overlook or hesitate to report to clinicians. Crucially, the stage at which CRC is diagnosed significantly impacts survivability, with a survival rate of 80-95\% for Stage I and a stark decline to 10\% for Stage IV. Unfortunately, in the UK, only 14.4\% of cases are diagnosed at the earliest stage (Stage I). In this study, we propose ColonScopeX, a machine learning framework utilizing explainable AI (XAI) methodologies to enhance the early detection of CRC and pre-cancerous lesions. Our approach employs a multimodal model that integrates signals from blood sample measurements, processed using the Savitzky-Golay algorithm for fingerprint smoothing, alongside comprehensive patient metadata, including medication history, comorbidities, age, weight, and BMI. By leveraging XAI techniques, we aim to render the model's decision-making process transparent and interpretable, thereby fostering greater trust and understanding in its predictions. The proposed framework could be utilised as a triage tool or a screening tool of the general population. This research highlights the potential of combining diverse patient data sources and explainable machine learning to tackle critical challenges in medical diagnostics.

Paper Structure

This paper contains 27 sections, 5 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The outline of the proposed Clinical Expert system: Step 1 and 2 preprocessing the spectra and applying the exclusion criteria based on the metadata, Step 3 metadata of interest is supplied to Fusion Models, Step 3a Spectral values only are put through the RF model, Step 4 Important features are extracted from both, Step 3 and 3a, Step 5 Text output supplied to the clinician.
  • Figure 2: Performance metrics in the model trained on dataset containing controls and polyps: a) The AUC performance on the balanced out model in polyp-control dataset, b) Validation accuracy per Fold, c) Validation AUC-ROC per fold. Performance metrics in the model trained on dataset containing controls and CRCs: d) The AUC performance on the balanced out model in polyp-control dataset, e) Validation accuracy per Fold, f) Validation AUC-ROC per fold
  • Figure 3: SHAP and LIME values across the top performing models. Panel a) and b) present SHAP and LIME feature importance for a patient suffering from CRC. Panel c) the top SHAP values for the model trained on controls and CRC patients, d) the top SHAP values for the model trained on controls and polyps