Table of Contents
Fetching ...

Mapping of Lesion Images to Somatic Mutations

Rahul Mehta

TL;DR

The paper tackles the challenge of predicting a patient's somatic mutation profile from lesion images to aid targeted cancer therapies. It introduces LLOST, a dual-variational autoencoder framework connected by a shared latent space conditioned on cancer type, with lesion data represented as modality-invariant point clouds and mutation data modeled via a Negative-Binomial (or Bernoulli) likelihood. A conditional invertible neural network links domain-specific and shared latents through conditional normalizing flows, and the shared space is optimized with Maximum Mean Discrepancy to enable robust cross-domain mapping. Experimental results show that LLOSTB best captures mutation distributions while LLOSTN B improves counts-based metrics, with the shared latent space enabling meaningful mappings even with limited training data, suggesting clinical potential for early prognosis and treatment planning. The work highlights avenues for extending to additional genetic domains and integrating richer imaging features.

Abstract

Medical imaging is a critical initial tool used by clinicians to determine a patient's cancer diagnosis, allowing for faster intervention and more reliable patient prognosis. At subsequent stages of patient diagnosis, genetic information is extracted to help select specific patient treatment options. As the efficacy of cancer treatment often relies on early diagnosis and treatment, we build a deep latent variable model to determine patients' somatic mutation profiles based on their corresponding medical images. We first introduce a point cloud representation of lesions images to allow for invariance to the imaging modality. We then propose, LLOST, a model with dual variational autoencoders coupled together by a separate shared latent space that unifies features from the lesion point clouds and counts of distinct somatic mutations. Therefore our model consists of three latent space, each of which is learned with a conditional normalizing flow prior to account for the diverse distributions of each domain. We conduct qualitative and quantitative experiments on de-identified medical images from The Cancer Imaging Archive and the corresponding somatic mutations from the Pan Cancer dataset of The Cancer Genomic Archive. We show the model's predictive performance on the counts of specific mutations as well as it's ability to accurately predict the occurrence of mutations. In particular, shared patterns between the imaging and somatic mutation domain that reflect cancer type. We conclude with a remark on how to improve the model and possible future avenues of research to include other genetic domains.

Mapping of Lesion Images to Somatic Mutations

TL;DR

The paper tackles the challenge of predicting a patient's somatic mutation profile from lesion images to aid targeted cancer therapies. It introduces LLOST, a dual-variational autoencoder framework connected by a shared latent space conditioned on cancer type, with lesion data represented as modality-invariant point clouds and mutation data modeled via a Negative-Binomial (or Bernoulli) likelihood. A conditional invertible neural network links domain-specific and shared latents through conditional normalizing flows, and the shared space is optimized with Maximum Mean Discrepancy to enable robust cross-domain mapping. Experimental results show that LLOSTB best captures mutation distributions while LLOSTN B improves counts-based metrics, with the shared latent space enabling meaningful mappings even with limited training data, suggesting clinical potential for early prognosis and treatment planning. The work highlights avenues for extending to additional genetic domains and integrating richer imaging features.

Abstract

Medical imaging is a critical initial tool used by clinicians to determine a patient's cancer diagnosis, allowing for faster intervention and more reliable patient prognosis. At subsequent stages of patient diagnosis, genetic information is extracted to help select specific patient treatment options. As the efficacy of cancer treatment often relies on early diagnosis and treatment, we build a deep latent variable model to determine patients' somatic mutation profiles based on their corresponding medical images. We first introduce a point cloud representation of lesions images to allow for invariance to the imaging modality. We then propose, LLOST, a model with dual variational autoencoders coupled together by a separate shared latent space that unifies features from the lesion point clouds and counts of distinct somatic mutations. Therefore our model consists of three latent space, each of which is learned with a conditional normalizing flow prior to account for the diverse distributions of each domain. We conduct qualitative and quantitative experiments on de-identified medical images from The Cancer Imaging Archive and the corresponding somatic mutations from the Pan Cancer dataset of The Cancer Genomic Archive. We show the model's predictive performance on the counts of specific mutations as well as it's ability to accurately predict the occurrence of mutations. In particular, shared patterns between the imaging and somatic mutation domain that reflect cancer type. We conclude with a remark on how to improve the model and possible future avenues of research to include other genetic domains.

Paper Structure

This paper contains 22 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1:
  • Figure 3: Model Architecture of LLOST. During training, the approximate posterior distribution of the domain specific embedding tries to match the true posterior with a learnable prior conditioned on the shared latent spaces. The shared latent space is trained by matching the distribution of the domain, so that it maps shared embeddings to domain specific embeddings. The model is trained bidirectionally to maximize the ELBO, which is a sum of reconstruction loss, the KL divergence of the conditional NF, and the MMD loss of the shared latent space. For clarity we drop the subscripts referring to the individual neural network parameters.
  • Figure 4: Log Perplexity (lower is better) of CVAEp, CVAEpb, CVAEr, LLOSTr, and LLOSTBas a function of epochs
  • Figure 5: Point prediction error of TML. The top plot shows the point estimate error in predicting the TML using LLOSTNB in the test samples. The bottom plot is a zoomed in of the top plot, where samples with less than 400 TML is reported. X-axis is the expected TML. Y-axis is the difference in TML of predicted and expected.
  • Figure 6: A TSNE of the shared latent space in the forward direction after a test batch of lesions point clouds. Best viewed digitally.
  • ...and 2 more figures