Table of Contents
Fetching ...

Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibration

Yijie Zheng, Rafael Fuentes-Dominguez, Matt Clark, George S. D. Gordon, Fernando Perez-Cota

TL;DR

Batch effects in time-resolved phonon-derived signals complicate cross-experiment cancer-cell classification. The authors present a multi-task conditional neural network that simultaneously calibrates inter-batch variation and classifies cells, using a conditional encoder, a variational encoder, dual classifiers, and a denoising decoder to produce a batch‑correct latent representation. On eight experimental batches, the method achieves a balanced precision of 89.22% with 89.07% cross-validated precision and sub-second inference, while latent-space denoising yields physically interpretable features such as sound velocity, attenuation, and phase. The approach offers robust, explainable diagnostics from phonon microscopy and highlights phase and adhesion-related features as potential cancer markers, with implications for scalable, batch-robust clinical applications.

Abstract

Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore present a multi-task conditional neural network framework to simultaneously achieve inter-batch calibration, by removing confounding variables, and accurate cell classification of time-resolved phonon-derived signals. We validate our approach by training and validating on different experimental batches, achieving a balanced precision of 89.22% and an average cross-validated precision of 89.07% for classifying background, healthy and cancerous regions. Classification can be performed in 0.5 seconds with only simple prior batch information required for multiple batch corrections. Further, we extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state including sound velocity, sound attenuation and cell-adhesion to substrate.

Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibration

TL;DR

Batch effects in time-resolved phonon-derived signals complicate cross-experiment cancer-cell classification. The authors present a multi-task conditional neural network that simultaneously calibrates inter-batch variation and classifies cells, using a conditional encoder, a variational encoder, dual classifiers, and a denoising decoder to produce a batch‑correct latent representation. On eight experimental batches, the method achieves a balanced precision of 89.22% with 89.07% cross-validated precision and sub-second inference, while latent-space denoising yields physically interpretable features such as sound velocity, attenuation, and phase. The approach offers robust, explainable diagnostics from phonon microscopy and highlights phase and adhesion-related features as potential cancer markers, with implications for scalable, batch-robust clinical applications.

Abstract

Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore present a multi-task conditional neural network framework to simultaneously achieve inter-batch calibration, by removing confounding variables, and accurate cell classification of time-resolved phonon-derived signals. We validate our approach by training and validating on different experimental batches, achieving a balanced precision of 89.22% and an average cross-validated precision of 89.07% for classifying background, healthy and cancerous regions. Classification can be performed in 0.5 seconds with only simple prior batch information required for multiple batch corrections. Further, we extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state including sound velocity, sound attenuation and cell-adhesion to substrate.
Paper Structure (14 sections, 5 equations, 4 figures, 2 tables)

This paper contains 14 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Multi-task conditional neural network (a) Conditional Encoder, trained with a reference dataset as a conditional reference operator. (b) Multi-task variational encoder, mapping the input data into a shared latent space with variational distribution. Also, two classifiers are trained separately where Classifier 1 aims to distinguish the normal, cancerous cells and the background, while Classifier 2 aims to prevent the model from learning the batch IDs of the data. (c) Denoising Decoder to reconstruct the input signal from the shared latent space. (d) Cluster visualization of shared latent space.
  • Figure 2: UMAP clustering plot (a) before and (b)-(f) after calibration with various calibration models, colored by 8 different batch IDs. Corresponding UMAP plot (g) - (l), colored by 3 different classes (i.e. normal, cancer, and background).
  • Figure 3: Inter-batch classification results of various classifiers evaluated by metrics (a) confusion matrix, (b) average precision distribution of each tested batch, (c) example cell imaging, (d) confusion matrix after 6-fold cross-validation, (e) sensitivity, (f) specificity. Computational resources comparison between various classifiers: (g) converge time and (h) Trainable parameters
  • Figure 4: Explainability of latent space for signal denoising. (a) Denoised signal reconstructed from latent space with dimension 128 and 1024. (b) Feature (Frequency, Attenuation and Phase) map of normal and cancerous cell examples from the reconstructed and original signal. (c) Feature (Frequency, Attenuation and Phase) cluster from the reconstructed and original signal, where blue represents background, red represents cancer and green represents normal.