Table of Contents
Fetching ...

Block Graph Neural Networks for tumor heterogeneity prediction

Marianne Abémgnigni Njifon, Tobias Weber, Viktor Bezborodov, Tyll Krueger, Dominic Schuhmacher

TL;DR

This work tackles tumor heterogeneity prediction by generating artificial tumor patches via a spatial birth–death simulation and labeling patches with a normalised entropy threshold. It introduces Block Graph Neural Networks (BGNN) that operate on graphs built from patch-local cell data, leveraging handcrafted node features that encode spatial structure and birth/death activity. The BGNN architecture, composed of node embedding, attention-based message passing, and global aggregation, achieves up to approximately 89.6% test accuracy on synthetic data, with birth/death density features providing the strongest signals. The study highlights the potential of combining physics-based data generation with graph neural models to support AI-assisted tumor grading and suggests extensions to real-world datasets and alternative cut schemes.

Abstract

Accurate tumor classification is essential for selecting effective treatments, but current methods have limitations. Standard tumor grading, which categorizes tumors based on cell differentiation, is not recommended as a stand-alone procedure, as some well-differentiated tumors can be malignant. Tumor heterogeneity assessment via single-cell sequencing offers profound insights but can be costly and may still require significant manual intervention. Many existing statistical machine learning methods for tumor data still require complex pre-processing of MRI and histopathological data. In this paper, we propose to build on a mathematical model that simulates tumor evolution (Ożański (2017)) and generate artificial datasets for tumor classification. Tumor heterogeneity is estimated using normalized entropy, with a threshold to classify tumors as having high or low heterogeneity. Our contributions are threefold: (1) the cut and graph generation processes from the artificial data, (2) the design of tumor features, and (3) the construction of Block Graph Neural Networks (BGNN), a Graph Neural Network-based approach to predict tumor heterogeneity. The experimental results reveal that the combination of the proposed features and models yields excellent results on artificially generated data ($89.67\%$ accuracy on the test data). In particular, in alignment with the emerging trends in AI-assisted grading and spatial transcriptomics, our results suggest that enriching traditional grading methods with birth (e.g., Ki-67 proliferation index) and death markers can improve heterogeneity prediction and enhance tumor classification.

Block Graph Neural Networks for tumor heterogeneity prediction

TL;DR

This work tackles tumor heterogeneity prediction by generating artificial tumor patches via a spatial birth–death simulation and labeling patches with a normalised entropy threshold. It introduces Block Graph Neural Networks (BGNN) that operate on graphs built from patch-local cell data, leveraging handcrafted node features that encode spatial structure and birth/death activity. The BGNN architecture, composed of node embedding, attention-based message passing, and global aggregation, achieves up to approximately 89.6% test accuracy on synthetic data, with birth/death density features providing the strongest signals. The study highlights the potential of combining physics-based data generation with graph neural models to support AI-assisted tumor grading and suggests extensions to real-world datasets and alternative cut schemes.

Abstract

Accurate tumor classification is essential for selecting effective treatments, but current methods have limitations. Standard tumor grading, which categorizes tumors based on cell differentiation, is not recommended as a stand-alone procedure, as some well-differentiated tumors can be malignant. Tumor heterogeneity assessment via single-cell sequencing offers profound insights but can be costly and may still require significant manual intervention. Many existing statistical machine learning methods for tumor data still require complex pre-processing of MRI and histopathological data. In this paper, we propose to build on a mathematical model that simulates tumor evolution (Ożański (2017)) and generate artificial datasets for tumor classification. Tumor heterogeneity is estimated using normalized entropy, with a threshold to classify tumors as having high or low heterogeneity. Our contributions are threefold: (1) the cut and graph generation processes from the artificial data, (2) the design of tumor features, and (3) the construction of Block Graph Neural Networks (BGNN), a Graph Neural Network-based approach to predict tumor heterogeneity. The experimental results reveal that the combination of the proposed features and models yields excellent results on artificially generated data ( accuracy on the test data). In particular, in alignment with the emerging trends in AI-assisted grading and spatial transcriptomics, our results suggest that enriching traditional grading methods with birth (e.g., Ki-67 proliferation index) and death markers can improve heterogeneity prediction and enhance tumor classification.

Paper Structure

This paper contains 18 sections, 23 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Data generation process. From left to right: (1) A tumor evolution is simulated given a set of global and intrinsic parameters (Ozanski_2020); (2) Thin layers called tumor cuts are later prepared from the synthetic tumor; (3) Tumor patches are generated from tumor cuts by randomly selecting positions and collecting all the cells within a designated neighborhood; (4) Graph data are constructed from selected patches, while the remaining patches are discarded.
  • Figure 2: Class label distribution for patches as a function of mutation probability. Class "$0$" corresponds to "low" heterogeneity patches, while class "1" corresponds to "high" heterogeneity patches. We observe that the heterogeneity level of patches increases as the mutation probability increases. On the one hand, when distinguishing between benign and malignant tumors, the increase in heterogeneity level indicates a higher probability of cancer. On the other hand, this heterogeneity can equally help distinguish between different degrees of aggressivity in malignant tumors.
  • Figure 3: BGNN model. Above each layer, the shape of the layer's output is given inside the pink brackets, when considering an input graph of shape $\left(N, d_0\right)$. The 3 blocks of the BGNN are outlined in orange.
  • Figure 4: Some statistics related to the simulated training data: (a) The histogram shows a prevalence of high entropic patches after applying the selection criteria. A balanced dataset is obtained as described in section \ref{['sec:training']}. (b) The birth and death events are not functions of the mutation probability per se. However, higher mutation rates may create more variability among individuals, which is an important aspect of evolutionary dynamics. (c) An increased normalised entropy indicates an increased mutation probability and therefore fewer deaths and fewer birth events. As a result, fewer points are available to generate graphs, which leads to a reduced number of points and edges.
  • Figure 5: Performance of BGNN as a function of node features. The various IDs indicate the node features used during training. ID 0: Local intensity, ID 1: density, ID 2: Local birth, ID 3: Local death, ID 4: Cell volume, ID 5: Birth binary encoding, ID 6: Death binary encoding
  • ...and 3 more figures