Mapping of Lesion Images to Somatic Mutations
Rahul Mehta
TL;DR
The paper tackles the challenge of predicting a patient's somatic mutation profile from lesion images to aid targeted cancer therapies. It introduces LLOST, a dual-variational autoencoder framework connected by a shared latent space conditioned on cancer type, with lesion data represented as modality-invariant point clouds and mutation data modeled via a Negative-Binomial (or Bernoulli) likelihood. A conditional invertible neural network links domain-specific and shared latents through conditional normalizing flows, and the shared space is optimized with Maximum Mean Discrepancy to enable robust cross-domain mapping. Experimental results show that LLOSTB best captures mutation distributions while LLOSTN B improves counts-based metrics, with the shared latent space enabling meaningful mappings even with limited training data, suggesting clinical potential for early prognosis and treatment planning. The work highlights avenues for extending to additional genetic domains and integrating richer imaging features.
Abstract
Medical imaging is a critical initial tool used by clinicians to determine a patient's cancer diagnosis, allowing for faster intervention and more reliable patient prognosis. At subsequent stages of patient diagnosis, genetic information is extracted to help select specific patient treatment options. As the efficacy of cancer treatment often relies on early diagnosis and treatment, we build a deep latent variable model to determine patients' somatic mutation profiles based on their corresponding medical images. We first introduce a point cloud representation of lesions images to allow for invariance to the imaging modality. We then propose, LLOST, a model with dual variational autoencoders coupled together by a separate shared latent space that unifies features from the lesion point clouds and counts of distinct somatic mutations. Therefore our model consists of three latent space, each of which is learned with a conditional normalizing flow prior to account for the diverse distributions of each domain. We conduct qualitative and quantitative experiments on de-identified medical images from The Cancer Imaging Archive and the corresponding somatic mutations from the Pan Cancer dataset of The Cancer Genomic Archive. We show the model's predictive performance on the counts of specific mutations as well as it's ability to accurately predict the occurrence of mutations. In particular, shared patterns between the imaging and somatic mutation domain that reflect cancer type. We conclude with a remark on how to improve the model and possible future avenues of research to include other genetic domains.
