Table of Contents
Fetching ...

Uncovering the Genetic Basis of Glioblastoma Heterogeneity through Multimodal Analysis of Whole Slide Images and RNA Sequencing Data

Ahmad Berjaoui, Louis Roussel, Eduardo Hugo Sanchez, Elizabeth Cohen-Jonathan Moyal

TL;DR

This study tackles glioblastoma heterogeneity by integrating whole-slide images and RNA-seq data through a novel multimodal deep learning framework. RNA-seq encodings are constructed via directed PPI-based gene clustering and masked autoencoding with a contrastive objective, while WSI representations are learned from 256×256 patches using ViT features and triplet loss. A cross-attention based fusion and multimodal contrastive objective align RNA and imaging modalities, enabling RNA retrieval from WSIs and revealing genetic profiles linked to distinct tumor patterns; Grad-CAM highlights key GBM-related genes and microenvironment regulators. The work identifies both known and novel genetic targets, demonstrates strong cross-modal generalization, and suggests potential avenues for personalized GBM therapies and biomarker discovery, leveraging TCGA and STEMRI datasets.

Abstract

Glioblastoma is a highly aggressive form of brain cancer characterized by rapid progression and poor prognosis. Despite advances in treatment, the underlying genetic mechanisms driving this aggressiveness remain poorly understood. In this study, we employed multimodal deep learning approaches to investigate glioblastoma heterogeneity using joint image/RNA-seq analysis. Our results reveal novel genes associated with glioblastoma. By leveraging a combination of whole-slide images and RNA-seq, as well as introducing novel methods to encode RNA-seq data, we identified specific genetic profiles that may explain different patterns of glioblastoma progression. These findings provide new insights into the genetic mechanisms underlying glioblastoma heterogeneity and highlight potential targets for therapeutic intervention. Code and data downloading instructions are available at: https://github.com/ma3oun/gbheterogeneity.

Uncovering the Genetic Basis of Glioblastoma Heterogeneity through Multimodal Analysis of Whole Slide Images and RNA Sequencing Data

TL;DR

This study tackles glioblastoma heterogeneity by integrating whole-slide images and RNA-seq data through a novel multimodal deep learning framework. RNA-seq encodings are constructed via directed PPI-based gene clustering and masked autoencoding with a contrastive objective, while WSI representations are learned from 256×256 patches using ViT features and triplet loss. A cross-attention based fusion and multimodal contrastive objective align RNA and imaging modalities, enabling RNA retrieval from WSIs and revealing genetic profiles linked to distinct tumor patterns; Grad-CAM highlights key GBM-related genes and microenvironment regulators. The work identifies both known and novel genetic targets, demonstrates strong cross-modal generalization, and suggests potential avenues for personalized GBM therapies and biomarker discovery, leveraging TCGA and STEMRI datasets.

Abstract

Glioblastoma is a highly aggressive form of brain cancer characterized by rapid progression and poor prognosis. Despite advances in treatment, the underlying genetic mechanisms driving this aggressiveness remain poorly understood. In this study, we employed multimodal deep learning approaches to investigate glioblastoma heterogeneity using joint image/RNA-seq analysis. Our results reveal novel genes associated with glioblastoma. By leveraging a combination of whole-slide images and RNA-seq, as well as introducing novel methods to encode RNA-seq data, we identified specific genetic profiles that may explain different patterns of glioblastoma progression. These findings provide new insights into the genetic mechanisms underlying glioblastoma heterogeneity and highlight potential targets for therapeutic intervention. Code and data downloading instructions are available at: https://github.com/ma3oun/gbheterogeneity.

Paper Structure

This paper contains 18 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: RNA-seq encoding. The original vector is reorganized according to directed graph clustering from the PPI knowledge graph. Sub-vectors are projected to a common embedding size. A token is randomly masked. The decoder reconstructs the RNA-seq vector using the remaining tokens and the cls token.
  • Figure 2: (Left) Example of a WSI of a mouse brain slice. (Right) Zoom on the upper-right part of the left brain slice. Red square patches have a majority of tumor cells whereas green square patches have a majority of non-tumor cells.
  • Figure 3: Multimodal model. A pairwise contrastive loss aligns RNA and WSI representations. A cross-attention module is used to obtain a joint representation. The latter is used for a classification head that matches data from the same tumor cells lineage and a RNA decoder reconstructs the original RNA vector.
  • Figure 4: RNA-seq retrieval using a WSI patch.
  • Figure 5: 2D t-SNE projections of RNA-seq representations of primary tumor locations, combining public TCGA data (circles) and STEMRI data (stars). The figure clearly showcases a strong similarity between public brain RNA-seq test data and STEMRI data.
  • ...and 2 more figures