Deep Learning Approaches for Blood Disease Diagnosis Across Hematopoietic Lineages

Gabriel Bo; Justin Gu; Christopher Sun

Deep Learning Approaches for Blood Disease Diagnosis Across Hematopoietic Lineages

Gabriel Bo, Justin Gu, Christopher Sun

TL;DR

This work builds a foundation model for blood disease diagnosis by embedding high-dimensional gene expression from hematopoietic cells into a $256$-dimensional latent space via a fully connected autoencoder trained on multipotent progenitors. The latent representations are then consumed by fully connected networks, Transformer self-attention, and graph convolutional networks to classify diseases, with a zero-shot evaluation on downstream cell types. The approach achieves multi-class accuracy above 95% on progenitors and attains a zero-shot F1-score greater than $0.7$ for lymphocytes, though cross-cell-type transfer to lymphocytes remains challenging. Overall, the study demonstrates cross-lineage utility of a single foundation model for hematopoietic disease diagnosis and highlights directions to improve lymphocyte-specific embeddings and generalization across cell types.

Abstract

We present a foundation modeling framework that leverages deep learning to uncover latent genetic signatures across the hematopoietic hierarchy. Our approach trains a fully connected autoencoder on multipotent progenitor cells, reducing over 20,000 gene features to a 256-dimensional latent space that captures predictive information for both progenitor and downstream differentiated cells such as monocytes and lymphocytes. We validate the quality of these embeddings by training feed-forward, transformer, and graph convolutional architectures for blood disease diagnosis tasks. We also explore zero-shot prediction using a progenitor disease state classification model to classify downstream cell conditions. Our models achieve greater than 95% accuracy for multi-class classification, and in the zero-shot setting, we achieve greater than 0.7 F1-score on the binary classification task. Future work should improve embeddings further to increase robustness on lymphocyte classification specifically.

Deep Learning Approaches for Blood Disease Diagnosis Across Hematopoietic Lineages

TL;DR

This work builds a foundation model for blood disease diagnosis by embedding high-dimensional gene expression from hematopoietic cells into a

-dimensional latent space via a fully connected autoencoder trained on multipotent progenitors. The latent representations are then consumed by fully connected networks, Transformer self-attention, and graph convolutional networks to classify diseases, with a zero-shot evaluation on downstream cell types. The approach achieves multi-class accuracy above 95% on progenitors and attains a zero-shot F1-score greater than

for lymphocytes, though cross-cell-type transfer to lymphocytes remains challenging. Overall, the study demonstrates cross-lineage utility of a single foundation model for hematopoietic disease diagnosis and highlights directions to improve lymphocyte-specific embeddings and generalization across cell types.

Deep Learning Approaches for Blood Disease Diagnosis Across Hematopoietic Lineages

TL;DR

Abstract

Deep Learning Approaches for Blood Disease Diagnosis Across Hematopoietic Lineages

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)