Table of Contents
Fetching ...

Karyotype AI for Precision Oncology

Zahra Shamsi, Isaac Reid, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

TL;DR

The paper addresses automated karyotyping directly from metaphase images to diagnose hematologic malignancies, confronting data scarcity and preprocessing challenges with a novel pretraining on chromosome identity followed by finetuning for aberration detection. It introduces an end-to-end Vision Transformer based pipeline that uses OWL-ViT for chromosome detection, SAM for segmentation, and dedicated ViT modules for chromosome identity and specific anomalies such as del(5q) and t(9;22), achieving high diagnostic performance and enabling zero-shot aberration detection. Key findings include a PR-AUC of $0.94$ for the targeted anomalies, rapid inference of about $15$ seconds per metaphase image, and strong performance even on rare aberrations with limited data. The work suggests substantial clinical impact by enabling faster, scalable, and more accessible karyotyping, with potential to reveal subclonal architecture and reduce turnaround times in cancer diagnostics.

Abstract

We present a machine learning method capable of accurately detecting chromosome abnormalities that cause blood cancers directly from microscope images of the metaphase stage of cell division. The pipeline is built on a series of fine-tuned Vision Transformers. Current state of the art (and standard clinical practice) requires expensive, manual expert analysis, whereas our pipeline takes only 15 seconds per metaphase image. Using a novel pretraining-finetuning strategy to mitigate the challenge of data scarcity, we achieve a high precision-recall score of 94% AUC for the clinically significant del(5q) and t(9;22) anomalies. Our method also unlocks zero-shot detection of rare aberrations based on model latent embeddings. The ability to quickly, accurately, and scalably diagnose genetic abnormalities directly from metaphase images could transform karyotyping practice and improve patient outcomes. We will make code publicly available.

Karyotype AI for Precision Oncology

TL;DR

The paper addresses automated karyotyping directly from metaphase images to diagnose hematologic malignancies, confronting data scarcity and preprocessing challenges with a novel pretraining on chromosome identity followed by finetuning for aberration detection. It introduces an end-to-end Vision Transformer based pipeline that uses OWL-ViT for chromosome detection, SAM for segmentation, and dedicated ViT modules for chromosome identity and specific anomalies such as del(5q) and t(9;22), achieving high diagnostic performance and enabling zero-shot aberration detection. Key findings include a PR-AUC of for the targeted anomalies, rapid inference of about seconds per metaphase image, and strong performance even on rare aberrations with limited data. The work suggests substantial clinical impact by enabling faster, scalable, and more accessible karyotyping, with potential to reveal subclonal architecture and reduce turnaround times in cancer diagnostics.

Abstract

We present a machine learning method capable of accurately detecting chromosome abnormalities that cause blood cancers directly from microscope images of the metaphase stage of cell division. The pipeline is built on a series of fine-tuned Vision Transformers. Current state of the art (and standard clinical practice) requires expensive, manual expert analysis, whereas our pipeline takes only 15 seconds per metaphase image. Using a novel pretraining-finetuning strategy to mitigate the challenge of data scarcity, we achieve a high precision-recall score of 94% AUC for the clinically significant del(5q) and t(9;22) anomalies. Our method also unlocks zero-shot detection of rare aberrations based on model latent embeddings. The ability to quickly, accurately, and scalably diagnose genetic abnormalities directly from metaphase images could transform karyotyping practice and improve patient outcomes. We will make code publicly available.
Paper Structure (19 sections, 10 figures, 3 tables)

This paper contains 19 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overall schematic. The model takes a microscope image of a stained cell undergoing metaphase as an input. OWL-ViT minderer2022simple, finetuned on chromosome data, is used to predict bounding boxes. SAM kirillov2023segment used to segment each chromosome therein. Each segmented chromosome is fed into a chromosome identity ('chrmID') ViT which predicts its number (1-22, X or Y). Candidate chromosome 5s are fed into a ViT that detects del(5q), and candidate chromosome 9 and 22s are fed into networks that detect t(9;22). This is straightforwardly extended to incorporate further structural aberrations. Multiple metaphase image predictions are aggregated to predict each patient's karyotype.
  • Figure 2: Chromosome identity prediction. (A) Accuracy of different models for chromosome identity prediction (1-22, X or Y), from pre-segmented, aligned and cropped chromosome images taken from respective karyograms. Performer with block-Toeplitz masking ('TopViT') performs best, followed by softmax attention ('ViT') then regular Performer ('XViT-P'). The CNN (Inception) is consistently worst. Black dots are hyperparameter instantiations. (B) Confusion matrix for the best chromosome identification model on a test set. (C) UMAP projection of the last intermediate layer (pre-logits) for the best-performing ViT model on a test set. Each point is coloured by its ground truth label, with predictions that disagree showing their label enlarged ten-fold and marked '$\times$'. (D) Classification accuracy (left $y$-axis) and percentage of remaining data (right $y$-axis) after removing high-entropy predictions, as a function of entropy cutoff ($x$-axis). For example, removing predictions with entropy $3.6 \times 10^{-6}$ or greater increases the accuracy to 99.9% by removing 5% of the data.
  • Figure 3: Aberration detection for del(5q) and t(9;22) anomalies. (A) UMAP projection of normal and abnormal chromosomes 5, 9 and 22, constructed using the previous chromosome identification network from Sec. \ref{['sec:chrmid_model']}. Healthy and anomalous examples are already fairly well-separated, motivating initialising the anomaly detection ViT with these weights or a de novo strategy. (B) Precision-recall area under curve (AUC-PR) and reciever-operator characteristic area under curve (ROC-AUC) -- two measures of diagnostic test accuracy -- plotted as a function of train dataset size for different anomalies. Green markers show performance when trained from scratch, whereas purple markers show performance when finetuned. The latter is consistently much better and substantially reduces the amount data needed for good performance.
  • Figure 4: Model performance of rare aberrations and precision-recall curves when aggregating predictions across cells and specimens. 10-fold cross-validation performance for (A) AUC PR and (B) ROC AUC. Each boxplot corresponds to a distinct cross-validation set for each chromosome involved in an aberration, and is coloured by aberration. Individual points for folds and averages (black diamonds) are overlaid on each boxplot. (C) Precision-recall curves for t(9;11), t(11;19), del(5q), and t(9;22), at the individual chromosome image level (orange) or aggregated at the cell (purple) or specimen levels. For specimen level $\geq$ 1 abnormal (blue) the single highest probability abnormal chromosome was used, for specimen level $\geq$ 2 abnormals (green) the second highest probability abnormal chromosome was used. Similarly, (D) shows precision-recall for de novo aberration detection based on distance to $N$-nearest chromosomes (here 50th) for t(9;11), t(11;19), del(5q), and t(9;22), respectively.
  • Figure 5: Patient-level PR curve. Precision-recall curve for patient-level detection of the del(5q) and t(9;22) chromosomal abnormalities, directly from $20$ microscope images of cells undergoing the metaphase stage of cell division.
  • ...and 5 more figures