Table of Contents
Fetching ...

MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

Tianyi Wang, Jianan Fan, Dingxin Zhang, Dongnan Liu, Yong Xia, Heng Huang, Weidong Cai

TL;DR

MIRROR, a novel multi-modal representation learning framework designed to foster both modality alignment and retention, is presented, demonstrating its effectiveness in constructing comprehensive oncological feature representations and benefiting the cancer diagnosis.

Abstract

Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease. Multi-modal self-supervised learning has demonstrated remarkable potential in learning pathological representations by integrating diverse data sources. Conventional multi-modal integration methods primarily emphasize modality alignment, while paying insufficient attention to retaining the modality-specific structures. However, unlike conventional scenarios where multi-modal inputs share highly overlapping features, histopathology and transcriptomics exhibit pronounced heterogeneity, offering orthogonal yet complementary insights. Histopathology provides morphological and spatial context, elucidating tissue architecture and cellular topology, whereas transcriptomics delineates molecular signatures through gene expression patterns. This inherent disparity introduces a major challenge in aligning them while maintaining modality-specific fidelity. To address these challenges, we present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention. MIRROR employs dedicated encoders to extract comprehensive features for each modality, which is further complemented by a modality alignment module to achieve seamless integration between phenotype patterns and molecular profiles. Furthermore, a modality retention module safeguards unique attributes from each modality, while a style clustering module mitigates redundancy and enhances disease-relevant information by modeling and aligning consistent pathological signatures within a clustering space. Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance, demonstrating its effectiveness in constructing comprehensive oncological feature representations and benefiting the cancer diagnosis.

MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention

TL;DR

MIRROR, a novel multi-modal representation learning framework designed to foster both modality alignment and retention, is presented, demonstrating its effectiveness in constructing comprehensive oncological feature representations and benefiting the cancer diagnosis.

Abstract

Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease. Multi-modal self-supervised learning has demonstrated remarkable potential in learning pathological representations by integrating diverse data sources. Conventional multi-modal integration methods primarily emphasize modality alignment, while paying insufficient attention to retaining the modality-specific structures. However, unlike conventional scenarios where multi-modal inputs share highly overlapping features, histopathology and transcriptomics exhibit pronounced heterogeneity, offering orthogonal yet complementary insights. Histopathology provides morphological and spatial context, elucidating tissue architecture and cellular topology, whereas transcriptomics delineates molecular signatures through gene expression patterns. This inherent disparity introduces a major challenge in aligning them while maintaining modality-specific fidelity. To address these challenges, we present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention. MIRROR employs dedicated encoders to extract comprehensive features for each modality, which is further complemented by a modality alignment module to achieve seamless integration between phenotype patterns and molecular profiles. Furthermore, a modality retention module safeguards unique attributes from each modality, while a style clustering module mitigates redundancy and enhances disease-relevant information by modeling and aligning consistent pathological signatures within a clustering space. Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance, demonstrating its effectiveness in constructing comprehensive oncological feature representations and benefiting the cancer diagnosis.

Paper Structure

This paper contains 41 sections, 2 theorems, 48 equations, 7 figures, 10 tables.

Key Result

Proposition 1

Let the two modality-specific posteriors be defined as: with diagonal covariance in $\mathbb R^{D}$. The style loss: has a unique global minimum: attained if and only if:

Figures (7)

  • Figure 1: MIRROR compared with conventional multi-modal integration methods. Unlike conventional methods that primarily emphasize capturing modality-shared information while paying limited attention to modality-specific intrinsic structures and indiscriminately learning both disease-relevant and irrelevant data with high redundancy, MIRROR is specifically designed to balance modality alignment and retention. By selectively preserving only disease-relevant features, it effectively mitigates redundancy, thereby enhancing the model’s efficiency and representational capability.
  • Figure 2: Overview of MIRROR. WSIs are first partitioned into patches, which are processed through a pre-trained patch encoder to extract patch-level feature representations. These features are subsequently aggregated by the slide encoder to encapsulate slide-level characteristics into a [CLS] token while projecting patch embeddings into the shared pathological latent space. Transcriptomics data are preprocessed using RFE and manual selection to identify high disease-related genes. The refined transcriptomic features are then embedded into a compact representation and mapped into the shared latent space via an RNA encoder. An alignment module for each modality aligns representations across modalities, guided by the alignment loss ($L_{align}$). Meanwhile, modality-specific retention modules utilize perturbed inputs from both encoded patch and transcriptomics features to capture modality-specific intrinsic structures, contributing to the retention loss ($L_{retention}$). Finally, both slide and transcriptomics representations are processed through a style clustering module to learn and compare their pathological styles against learnable cluster centers, with the clustering loss ($L_{cluster}$) used to align consistent pathological styles within the cluster space.
  • Figure 3: Transcriptomics data distributions in the TCGA-NSCLC dataset. The top panel displays a heatmap visualization of transcriptomic data for two subtypes in TCGA-NSCLC: TCGA-LUAD on the left and TCGA-LUSC on the right. The data distribution exhibits substantial variability and clear subtype distinction after preprocessing, providing a robust foundation for representation learning. The bottom bar plot highlights the top 10 most variant genes in the TCGA-NSCLC dataset, identified with reference to the COSMIC database, demonstrating extraordinary biological explainability.
  • Figure 4: Visualization of slide encoder attention weights on TCGA-BRCA, TCGA-NSCLC, TCGA-RCC and TCGA-COADREAD. Regions exhibiting higher attention scores predominantly correspond to malignant, tumor-bearing tissue, whereas areas with lower scores typically indicate normal regions.
  • Figure 5: Visualization of histopathology and transcriptomics features encoded by MIRROR on the TCGA-NSCLC dataset, compared to those obtained using TANGLE. Red dots represent samples from TCGA-LUAD, while blue dots represent samples from TCGA-LUSC. Circles denote RNA features, while triangles represent WSI features. MIRROR clearly yields more distinct and well-aligned feature distributions.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof