Table of Contents
Fetching ...

Why Attention Graphs Are All We Need: Pioneering Hierarchical Classification of Hematologic Cell Populations with LeukoGraph

Fatemeh Nassajian Mojarrad, Lorenzo Bini, Thomas Matthes, Stéphane Marchand-Maillet

TL;DR

LeukoGraph tackles hierarchical cell classification in flow cytometry by implementing a two-module neural architecture: an Early Module H based on Graph Attention Networks and a Max Constraint Module (MCM) that enforces hierarchy coherence. Training uses a max-constraint loss (MCLoss) to exploit hierarchical structure while avoiding violations, enabling scalable inference on graphs with large node counts. Empirically, LeukoGraph achieves state-of-the-art hierarchical metrics and robust leaf-class performance across 30 patient samples, and offers interpretability through marker-feature importance. The approach promises practical impact by delivering fast, accurate, and interpretable hematologic cell classification that can support clinical workflows and large-scale datasets.

Abstract

In the complex landscape of hematologic samples such as peripheral blood or bone marrow, cell classification, delineating diverse populations into a hierarchical structure, presents profound challenges. This study presents LeukoGraph, a recently developed framework designed explicitly for this purpose employing graph attention networks (GATs) to navigate hierarchical classification (HC) complexities. Notably, LeukoGraph stands as a pioneering effort, marking the application of graph neural networks (GNNs) for hierarchical inference on graphs, accommodating up to one million nodes and millions of edges, all derived from flow cytometry data. LeukoGraph intricately addresses a classification paradigm where for example four different cell populations undergo flat categorization, while a fifth diverges into two distinct child branches, exemplifying the nuanced hierarchical structure inherent in complex datasets. The technique is more general than this example. A hallmark achievement of LeukoGraph is its F-score of 98%, significantly outclassing prevailing state-of-the-art methodologies. Crucially, LeukoGraph's prowess extends beyond theoretical innovation, showcasing remarkable precision in predicting both flat and hierarchical cell types across flow cytometry datasets from 30 distinct patients. This precision is further underscored by LeukoGraph's ability to maintain a correct label ratio, despite the inherent challenges posed by hierarchical classifications.

Why Attention Graphs Are All We Need: Pioneering Hierarchical Classification of Hematologic Cell Populations with LeukoGraph

TL;DR

LeukoGraph tackles hierarchical cell classification in flow cytometry by implementing a two-module neural architecture: an Early Module H based on Graph Attention Networks and a Max Constraint Module (MCM) that enforces hierarchy coherence. Training uses a max-constraint loss (MCLoss) to exploit hierarchical structure while avoiding violations, enabling scalable inference on graphs with large node counts. Empirically, LeukoGraph achieves state-of-the-art hierarchical metrics and robust leaf-class performance across 30 patient samples, and offers interpretability through marker-feature importance. The approach promises practical impact by delivering fast, accurate, and interpretable hematologic cell classification that can support clinical workflows and large-scale datasets.

Abstract

In the complex landscape of hematologic samples such as peripheral blood or bone marrow, cell classification, delineating diverse populations into a hierarchical structure, presents profound challenges. This study presents LeukoGraph, a recently developed framework designed explicitly for this purpose employing graph attention networks (GATs) to navigate hierarchical classification (HC) complexities. Notably, LeukoGraph stands as a pioneering effort, marking the application of graph neural networks (GNNs) for hierarchical inference on graphs, accommodating up to one million nodes and millions of edges, all derived from flow cytometry data. LeukoGraph intricately addresses a classification paradigm where for example four different cell populations undergo flat categorization, while a fifth diverges into two distinct child branches, exemplifying the nuanced hierarchical structure inherent in complex datasets. The technique is more general than this example. A hallmark achievement of LeukoGraph is its F-score of 98%, significantly outclassing prevailing state-of-the-art methodologies. Crucially, LeukoGraph's prowess extends beyond theoretical innovation, showcasing remarkable precision in predicting both flat and hierarchical cell types across flow cytometry datasets from 30 distinct patients. This precision is further underscored by LeukoGraph's ability to maintain a correct label ratio, despite the inherent challenges posed by hierarchical classifications.
Paper Structure (13 sections, 2 theorems, 5 equations, 7 figures, 6 tables)

This paper contains 13 sections, 2 theorems, 5 equations, 7 figures, 6 tables.

Key Result

Theorem 1

Let $\mathbf{x}\in \mathbb{R}^m$ be a data point. Let $\mathcal{C}=\{A_1,\cdots, A_C\}$ be a set of hierarchically structured classes and let $\mathcal{H}$ be a early module with outputs $\mathcal{H}_{A_1},\cdots, \mathcal{H}_{A_C}$ ($\mathcal{H}_{A_c}\in[0,1]~\forall c$) given the input $\mathbf{x}

Figures (7)

  • Figure 1: Depiction of the HC model.
  • Figure 2: (a): Computing the normalized attention coefficients $\gamma_{ij}$ (b): Multi-head attention of node 1 on its neighborhood. Arrows show concatenation or averaging of attention
  • Figure 3: MCLoss for training and test sets.
  • Figure 4: Results for patient 11
  • Figure 5: Results for patient 12
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Proposition 1