Table of Contents
Fetching ...

MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics

Kaichen Xu, Qilong Wu, Yan Lu, Yinan Zheng, Wenlin Li, Xingjie Tang, Jun Wang, Xiaobo Sun

TL;DR

This work addresses the difficulty of detecting anomalous tissue regions when histology alone is insufficient by leveraging spatial transcriptomics as a complementary molecular modality. It introduces MEATRD, a multimodal ATR detector that blends histology and ST through a three-stage pipeline anchored by the Masked Graph Dual-Attention Transformer (MGDAT), culminating in a latent reconstruction loss-based one-class classifier. The approach achieves state-of-the-art performance across eight breast cancer and four PSC datasets, including challenging cases with minimal visual deviations, and provides theoretical insights into the informational properties of multimodal bottleneck encoding. The results highlight the practical impact of integrating molecular context with imaging for precise tissue anomaly detection and suggest broad applicability to other multimodal anomaly-detection tasks.

Abstract

The detection of anomalous tissue regions (ATRs) within affected tissues is crucial in clinical diagnosis and pathological studies. Conventional automated ATR detection methods, primarily based on histology images alone, falter in cases where ATRs and normal tissues have subtle visual differences. The recent spatial transcriptomics (ST) technology profiles gene expressions across tissue regions, offering a molecular perspective for detecting ATRs. However, there is a dearth of ATR detection methods that effectively harness complementary information from both histology images and ST. To address this gap, we propose MEATRD, a novel ATR detection method that integrates histology image and ST data. MEATRD is trained to reconstruct image patches and gene expression profiles of normal tissue spots (inliers) from their multimodal embeddings, followed by learning a one-class classification AD model based on latent multimodal reconstruction errors. This strategy harmonizes the strengths of reconstruction-based and one-class classification approaches. At the heart of MEATRD is an innovative masked graph dual-attention transformer (MGDAT) network, which not only facilitates cross-modality and cross-node information sharing but also addresses the model over-generalization issue commonly seen in reconstruction-based AD methods. Additionally, we demonstrate that modality-specific, task-relevant information is collated and condensed in multimodal bottleneck encoding generated in MGDAT, marking the first theoretical analysis of the informational properties of multimodal bottleneck encoding. Extensive evaluations across eight real ST datasets reveal MEATRD's superior performance in ATR detection, surpassing various state-of-the-art AD methods. Remarkably, MEATRD also proves adept at discerning ATRs that only show slight visual deviations from normal tissues.

MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics

TL;DR

This work addresses the difficulty of detecting anomalous tissue regions when histology alone is insufficient by leveraging spatial transcriptomics as a complementary molecular modality. It introduces MEATRD, a multimodal ATR detector that blends histology and ST through a three-stage pipeline anchored by the Masked Graph Dual-Attention Transformer (MGDAT), culminating in a latent reconstruction loss-based one-class classifier. The approach achieves state-of-the-art performance across eight breast cancer and four PSC datasets, including challenging cases with minimal visual deviations, and provides theoretical insights into the informational properties of multimodal bottleneck encoding. The results highlight the practical impact of integrating molecular context with imaging for precise tissue anomaly detection and suggest broad applicability to other multimodal anomaly-detection tasks.

Abstract

The detection of anomalous tissue regions (ATRs) within affected tissues is crucial in clinical diagnosis and pathological studies. Conventional automated ATR detection methods, primarily based on histology images alone, falter in cases where ATRs and normal tissues have subtle visual differences. The recent spatial transcriptomics (ST) technology profiles gene expressions across tissue regions, offering a molecular perspective for detecting ATRs. However, there is a dearth of ATR detection methods that effectively harness complementary information from both histology images and ST. To address this gap, we propose MEATRD, a novel ATR detection method that integrates histology image and ST data. MEATRD is trained to reconstruct image patches and gene expression profiles of normal tissue spots (inliers) from their multimodal embeddings, followed by learning a one-class classification AD model based on latent multimodal reconstruction errors. This strategy harmonizes the strengths of reconstruction-based and one-class classification approaches. At the heart of MEATRD is an innovative masked graph dual-attention transformer (MGDAT) network, which not only facilitates cross-modality and cross-node information sharing but also addresses the model over-generalization issue commonly seen in reconstruction-based AD methods. Additionally, we demonstrate that modality-specific, task-relevant information is collated and condensed in multimodal bottleneck encoding generated in MGDAT, marking the first theoretical analysis of the informational properties of multimodal bottleneck encoding. Extensive evaluations across eight real ST datasets reveal MEATRD's superior performance in ATR detection, surpassing various state-of-the-art AD methods. Remarkably, MEATRD also proves adept at discerning ATRs that only show slight visual deviations from normal tissues.

Paper Structure

This paper contains 34 sections, 2 theorems, 74 equations, 5 figures, 9 tables.

Key Result

Proposition D.1

Inclusiveness of complementary task-relevant information. The objective functions in Assumption 1.2 are optimized when the bottleneck encoding $z_3$ encompasses all task-relevant information specific to $v_1$ and $v_2$:

Figures (5)

  • Figure 1: Detecting ATRs with histology images and ST data. ATRs include both tumor core and edge regions, as delineated by red and blue outlines in the histology image, respectively. The tumor edge region visually resembles the adjacent normal tissues. In the spatial map of the ST dataset, the ATRs encompass both red and blank spots, with blank spots indicating locations of missing gene expression data.
  • Figure 2: The workflow of MEATRD.
  • Figure 3: Visualized detection results of tumor edge regions that visually resemble the adjacent normal tissues in the 10x-hBC-I1 dataset. The first row, from left to right, displays the original histology image, the one annotated with ground truth region labels, the one highlighting the tumor edge region (in red) and the adjacent healthy region (in blue), and the one annotated with ATRs identified by DOMINANT. The second row presents images annotated with ATRs identified by their respective methods. The performance of each method is also quantified using mean precision and recall scores over five independent runs. These metrics, along with their standard deviations, are displayed right to each method's panel.
  • Figure 4: Information diagrams of the two data modalities $v_1$ and $v_2$. The bottleneck encoding, generated by the MGDAT block, embodies the minimally sufficient representation for modality-specific, task-relevant information (i.e., $b_1+b_2$).
  • Figure 5: Violin plots illustrating the distributions of latent multimodal reconstruction errors for inliers (blue) and anomalies (yellow) with ("Full") and without ("w/o TNM") the implementation of target node masking (TNM).

Theorems & Definitions (10)

  • Definition D.1
  • Definition D.2
  • Definition D.3
  • Definition D.4
  • Definition D.5
  • proof
  • Proposition D.1
  • proof
  • Proposition D.2
  • proof