Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibration
Yijie Zheng, Rafael Fuentes-Dominguez, Matt Clark, George S. D. Gordon, Fernando Perez-Cota
TL;DR
Batch effects in time-resolved phonon-derived signals complicate cross-experiment cancer-cell classification. The authors present a multi-task conditional neural network that simultaneously calibrates inter-batch variation and classifies cells, using a conditional encoder, a variational encoder, dual classifiers, and a denoising decoder to produce a batch‑correct latent representation. On eight experimental batches, the method achieves a balanced precision of 89.22% with 89.07% cross-validated precision and sub-second inference, while latent-space denoising yields physically interpretable features such as sound velocity, attenuation, and phase. The approach offers robust, explainable diagnostics from phonon microscopy and highlights phase and adhesion-related features as potential cancer markers, with implications for scalable, batch-robust clinical applications.
Abstract
Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore present a multi-task conditional neural network framework to simultaneously achieve inter-batch calibration, by removing confounding variables, and accurate cell classification of time-resolved phonon-derived signals. We validate our approach by training and validating on different experimental batches, achieving a balanced precision of 89.22% and an average cross-validated precision of 89.07% for classifying background, healthy and cancerous regions. Classification can be performed in 0.5 seconds with only simple prior batch information required for multiple batch corrections. Further, we extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state including sound velocity, sound attenuation and cell-adhesion to substrate.
