Table of Contents
Fetching ...

Engineering Spatial and Molecular Features from Cellular Niches to Inform Predictions of Inflammatory Bowel Disease

Myles Joshua Toledo Tan, Maria Kapetanaki, Panayiotis V. Benos

TL;DR

This work addresses the challenge of differentiating Crohn's disease and ulcerative colitis within inflammatory bowel disease by integrating spatial transcriptomics with explainable machine learning. The authors map cell types in CosMx ST data using cell2location, decompose tissue into four cellular niches via non-negative matrix factorization, and engineer 44 features spanning niche composition, neighborhood enrichment, and niche–gene signals. An MLP classifier achieves $0.774 \pm 0.161$ accuracy for HC/UC/CD and $0.916 \pm 0.118$ for HC vs IBD, with explainability analyses showing that spatial disruption underlies general inflammation while niche–gene signatures distinguish UC from CD. The approach yields biologically interpretable predictors, proposes testable mechanistic hypotheses, and serves as a proof-of-concept for turning spatial maps into actionable diagnostic insights, albeit with limitations related to sample size and panel breadth.

Abstract

Differentiating between the two main subtypes of Inflammatory Bowel Disease (IBD): Crohns disease (CD) and ulcerative colitis (UC) is a persistent clinical challenge due to overlapping presentations. This study introduces a novel computational framework that employs spatial transcriptomics (ST) to create an explainable machine learning model for IBD classification. We analyzed ST data from the colonic mucosa of healthy controls (HC), UC, and CD patients. Using Non-negative Matrix Factorization (NMF), we first identified four recurring cellular niches, representing distinct functional microenvironments within the tissue. From these niches, we systematically engineered 44 features capturing three key aspects of tissue pathology: niche composition, neighborhood enrichment, and niche-gene signals. A multilayer perceptron (MLP) classifier trained on these features achieved an accuracy of $0.774 \pm 0.161$ for the more challenging three-class problem (HC, UC, and CD) and $0.916 \pm 0.118$ in the two-class problem of distinguishing IBD from healthy tissue. Crucially, model explainability analysis revealed that disruptions in the spatial organization of niches were the strongest predictors of general inflammation, while the classification between UC and CD relied on specific niche-gene expression signatures. This work provides a robust, proof-of-concept pipeline that transforms descriptive spatial data into an accurate and explainable predictive tool, offering not only a potential new diagnostic paradigm but also deeper insights into the distinct biological mechanisms that drive IBD subtypes.

Engineering Spatial and Molecular Features from Cellular Niches to Inform Predictions of Inflammatory Bowel Disease

TL;DR

This work addresses the challenge of differentiating Crohn's disease and ulcerative colitis within inflammatory bowel disease by integrating spatial transcriptomics with explainable machine learning. The authors map cell types in CosMx ST data using cell2location, decompose tissue into four cellular niches via non-negative matrix factorization, and engineer 44 features spanning niche composition, neighborhood enrichment, and niche–gene signals. An MLP classifier achieves accuracy for HC/UC/CD and for HC vs IBD, with explainability analyses showing that spatial disruption underlies general inflammation while niche–gene signatures distinguish UC from CD. The approach yields biologically interpretable predictors, proposes testable mechanistic hypotheses, and serves as a proof-of-concept for turning spatial maps into actionable diagnostic insights, albeit with limitations related to sample size and panel breadth.

Abstract

Differentiating between the two main subtypes of Inflammatory Bowel Disease (IBD): Crohns disease (CD) and ulcerative colitis (UC) is a persistent clinical challenge due to overlapping presentations. This study introduces a novel computational framework that employs spatial transcriptomics (ST) to create an explainable machine learning model for IBD classification. We analyzed ST data from the colonic mucosa of healthy controls (HC), UC, and CD patients. Using Non-negative Matrix Factorization (NMF), we first identified four recurring cellular niches, representing distinct functional microenvironments within the tissue. From these niches, we systematically engineered 44 features capturing three key aspects of tissue pathology: niche composition, neighborhood enrichment, and niche-gene signals. A multilayer perceptron (MLP) classifier trained on these features achieved an accuracy of for the more challenging three-class problem (HC, UC, and CD) and in the two-class problem of distinguishing IBD from healthy tissue. Crucially, model explainability analysis revealed that disruptions in the spatial organization of niches were the strongest predictors of general inflammation, while the classification between UC and CD relied on specific niche-gene expression signatures. This work provides a robust, proof-of-concept pipeline that transforms descriptive spatial data into an accurate and explainable predictive tool, offering not only a potential new diagnostic paradigm but also deeper insights into the distinct biological mechanisms that drive IBD subtypes.

Paper Structure

This paper contains 29 sections, 4 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Illustration of cell--cell neighborhood centered on a focal cell (Niche 1, purple). The focal cell has diameter d, so any cell within a radius $2d$ from the center of the focal cell is part of its neighborhood. In this example, the neighborhood includes one additional Niche 1 cell, three Niche 2 cells (blue), two Niche 3 cells (pink), and one Niche 4 cell (orange). The dashed green circle represents the neighborhood boundary. Created in https://BioRender.com.
  • Figure 2: Architecture of the multilayer perceptron (MLP) used for classification. The network consists of an input layer with 44 features, four hidden layers with 40, 20, 10, and 5 neurons respectively, and an output layer with three neurons (corresponding to HC, UC, and CD) using the softmax activation function.
  • Figure 3: Visualization of cellular niches and their cell type composition in FOV UC a_8. (a) Cellular niches identified by NMF as colored points overlaid on the histomorphology; (b) The five most abundant cell types within Niche 3.
  • Figure 4: Niche enrichment comparisons across (a) HC, (b) UC, (c) CD.
  • Figure 5: Confusion matrices for the (a) three-class, and (b) two-class problems.
  • ...and 3 more figures