Table of Contents
Fetching ...

SpecReX: Explainable AI for Raman Spectroscopy

Nathan Blake, David A. Kelly, Akchunya Chanchal, Sarah Kapllani-Mucaj, Geraint Thomas, Hana Chockler

TL;DR

The paper tackles explainability in deep-learning Raman spectroscopy diagnostics by introducing SpecReX, a spectrum-specific XAI tool grounded in actual causality. SpecReX builds a responsibility map by iteratively mutating spectral regions and querying a model, and validates explanations against synthetic data with known ground-truth signals, comparing to SHAP baselines. On three simulated datasets, SpecReX localizes to the discriminative spectral features across single, double, and complex peak scenarios, producing bounded, interpretable $[0,1]$ responsibility scores. The results support the potential of SpecReX to guide biomolecular interpretation and regulatory-friendly explanations, while highlighting the need for in vitro/ex vivo validation before clinical deployment.

Abstract

Raman spectroscopy is becoming more common for medical diagnostics with deep learning models being increasingly used to leverage its full potential. However, the opaque nature of such models and the sensitivity of medical diagnosis together with regulatory requirements necessitate the need for explainable AI tools. We introduce SpecReX, specifically adapted to explaining Raman spectra. SpecReX uses the theory of actual causality to rank causal responsibility in a spectrum, quantified by iteratively refining mutated versions of the spectrum and testing if it retains the original classification. The explanations provided by SpecReX take the form of a responsibility map, highlighting spectral regions most responsible for the model to make a correct classification. To assess the validity of SpecReX, we create increasingly complex simulated spectra, in which a "ground truth" signal is seeded, to train a classifier. We then obtain SpecReX explanations and compare the results with another explainability tool. By using simulated spectra we establish that SpecReX localizes to the known differences between classes, under a number of conditions. This provides a foundation on which we can find the spectral features which differentiate disease classes. This is an important first step in proving the validity of SpecReX.

SpecReX: Explainable AI for Raman Spectroscopy

TL;DR

The paper tackles explainability in deep-learning Raman spectroscopy diagnostics by introducing SpecReX, a spectrum-specific XAI tool grounded in actual causality. SpecReX builds a responsibility map by iteratively mutating spectral regions and querying a model, and validates explanations against synthetic data with known ground-truth signals, comparing to SHAP baselines. On three simulated datasets, SpecReX localizes to the discriminative spectral features across single, double, and complex peak scenarios, producing bounded, interpretable responsibility scores. The results support the potential of SpecReX to guide biomolecular interpretation and regulatory-friendly explanations, while highlighting the need for in vitro/ex vivo validation before clinical deployment.

Abstract

Raman spectroscopy is becoming more common for medical diagnostics with deep learning models being increasingly used to leverage its full potential. However, the opaque nature of such models and the sensitivity of medical diagnosis together with regulatory requirements necessitate the need for explainable AI tools. We introduce SpecReX, specifically adapted to explaining Raman spectra. SpecReX uses the theory of actual causality to rank causal responsibility in a spectrum, quantified by iteratively refining mutated versions of the spectrum and testing if it retains the original classification. The explanations provided by SpecReX take the form of a responsibility map, highlighting spectral regions most responsible for the model to make a correct classification. To assess the validity of SpecReX, we create increasingly complex simulated spectra, in which a "ground truth" signal is seeded, to train a classifier. We then obtain SpecReX explanations and compare the results with another explainability tool. By using simulated spectra we establish that SpecReX localizes to the known differences between classes, under a number of conditions. This provides a foundation on which we can find the spectral features which differentiate disease classes. This is an important first step in proving the validity of SpecReX.

Paper Structure

This paper contains 11 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A typical biological Raman spectrum. The x-axis shows the extent of wavenumber, or Raman, shift. The y-axis shows the number of photons, also called the intensity - this is often normalized and so reported in arbitrary units (A.U.) The colored regions approximately correspond to various biochemical features of interest to a clinical problem. Figure adjusted from delrue2022potential.
  • Figure 2: Schematic representation of SpecReX. First, SpecReX creates a set of mutants by generating random coordinates from the start to the end of the input wavenumber region, which are considered to be the initial splitting positions of the spectrum. The value from the previous split coordinate (start of the spectrum in the case of the first coordinate) is retained, while the rest of the spectrum is linearly interpolated. The model is then called for all the generated mutants. From the set of correctly classified mutants, the mutant with the smallest retained region is further explored, by recursively repeating the described procedure until the maximum search tree depth is reached, none of the mutants are classified correctly, or all generated mutants are too small. At each level of the search tree, to further isolate important regions, the start and end regions are set to the retained regions of the previous level, whilst the rest of the spectrum is inherited from the parent.
  • Figure 3: Mean spectra for the single peak dataset. The discriminating peaks for the focus classes are at 250 $cm^{-1}$ and 750 $cm^{-1}$, defining classes 0 and 1 respectively. The shaded regions highlight the discriminating features between classes.
  • Figure 4: Mean spectra for the double peak dataset. The peaks at 250 $cm^{-1}$ and 750 $cm^{-1}$ are the discriminating peaks for class 0, while the peaks at 150 $cm^{-1}$ and 750 $cm^{-1}$ are the discriminating peaks for class 1. The shaded regions highlight the discriminating features between classes.
  • Figure 5: Mean spectra for the complex peak dataset. Each class has 18 Raman bands of varying breadth. The peaks at 370 $cm^{-1}$ and 1100 $cm^{-1}$ are the discriminating peaks for class 0 and class 1 respectively. The shaded regions highlight the discriminating features between classes.
  • ...and 2 more figures