Table of Contents
Fetching ...

Revealing Subtle Phenotypes in Small Microscopy Datasets Using Latent Diffusion Models

Anis Bourou, Biel Castaño Segade, Thomas Boye, Valérie Mezger, Auguste Genovesio

TL;DR

This work tackles the challenge of detecting subtle cellular phenotypes in small microscopy datasets. It introduces Phen-LDiff, a method that fine-tunes pre-trained Latent Diffusion Models (LDMs) for image-to-image translation between experimental conditions, using latent inversion to generate target-class images. By comparing fine-tuning strategies like LoRA and SVDiff, the study demonstrates improved generalization and reduced memorization, enabling reliable translation with as few as 100 images per class. The approach yields high-quality translations across multiple datasets and reveals both apparent and subtle phenotypic changes, offering a computationally efficient tool for phenotype detection with potential impact on biology and drug discovery.

Abstract

Identifying subtle phenotypic variations in cellular images is critical for advancing biological research and accelerating drug discovery. These variations are often masked by the inherent cellular heterogeneity, making it challenging to distinguish differences between experimental conditions. Recent advancements in deep generative models have demonstrated significant potential for revealing these nuanced phenotypes through image translation, opening new frontiers in cellular and molecular biology as well as the identification of novel biomarkers. Among these generative models, diffusion models stand out for their ability to produce high-quality, realistic images. However, training diffusion models typically requires large datasets and substantial computational resources, both of which can be limited in biological research. In this work, we propose a novel approach that leverages pre-trained latent diffusion models to uncover subtle phenotypic changes. We validate our approach qualitatively and quantitatively on several small datasets of microscopy images. Our findings reveal that our approach enables effective detection of phenotypic variations, capturing both visually apparent and imperceptible differences. Ultimately, our results highlight the promising potential of this approach for phenotype detection, especially in contexts constrained by limited data and computational capacity.

Revealing Subtle Phenotypes in Small Microscopy Datasets Using Latent Diffusion Models

TL;DR

This work tackles the challenge of detecting subtle cellular phenotypes in small microscopy datasets. It introduces Phen-LDiff, a method that fine-tunes pre-trained Latent Diffusion Models (LDMs) for image-to-image translation between experimental conditions, using latent inversion to generate target-class images. By comparing fine-tuning strategies like LoRA and SVDiff, the study demonstrates improved generalization and reduced memorization, enabling reliable translation with as few as 100 images per class. The approach yields high-quality translations across multiple datasets and reveals both apparent and subtle phenotypic changes, offering a computationally efficient tool for phenotype detection with potential impact on biology and drug discovery.

Abstract

Identifying subtle phenotypic variations in cellular images is critical for advancing biological research and accelerating drug discovery. These variations are often masked by the inherent cellular heterogeneity, making it challenging to distinguish differences between experimental conditions. Recent advancements in deep generative models have demonstrated significant potential for revealing these nuanced phenotypes through image translation, opening new frontiers in cellular and molecular biology as well as the identification of novel biomarkers. Among these generative models, diffusion models stand out for their ability to produce high-quality, realistic images. However, training diffusion models typically requires large datasets and substantial computational resources, both of which can be limited in biological research. In this work, we propose a novel approach that leverages pre-trained latent diffusion models to uncover subtle phenotypic changes. We validate our approach qualitatively and quantitatively on several small datasets of microscopy images. Our findings reveal that our approach enables effective detection of phenotypic variations, capturing both visually apparent and imperceptible differences. Ultimately, our results highlight the promising potential of this approach for phenotype detection, especially in contexts constrained by limited data and computational capacity.

Paper Structure

This paper contains 18 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Top: Real images from the LRRK2 dataset, displaying wild-type images in the first row and images of mutated neurons in the second row. Bottom: Real images from the Golgi dataset, with untreated images in the first row and Nocodazole-treated images in the second row. In both (a) and (b), identifying and interpreting differences between the two classes by eye is highly challenging. However, it is essential for understanding the disease in (a) and assessing the treatment effects in (b)
  • Figure 2: We fine-tuned diffusion models on four different microscopy image datasets and performed translations from the source class to the target class. We observed the following: In (a), the translated images of untreated BBBC021 samples successfully replicated the effects of Latrunculin B treatment, where we observed a decrease in cell count and the disappearance of the cytoplasmic skeleton, likely due to the toxicity of the treatment. In (b), TNF treatment on cells and its translocation effect was well recapitulated by image translation. In (c), we translated images of wild-type cells to images of LRRK2 mutated cells and noticed a reduction in neuron density and complexity (red squares) and an increase of $\alpha$-synuclein (yellow squares), recapitulating known effects of the mutation. Finally, in (d), we observed the correct replication of the effect of Nocodazole treatment causing the scattering of the Golgi apparatus (red squares). Note how pronounced ((a), (b)) as well as subtle ((c), (d)) phenotypic changes are well captured by our model. In any case seeing the same cell before and after treatment allowed us to assess the effect of the perturbation. Real images of both conditions of the four datasets can be seen in Appendix A.1.
  • Figure 3: Phen-LDiff leverages fine-tuned LDMs to perform image-to-image translation, identifying phenotypic variations between the images of two conditions. First, a fine-tuned model is used to invert an image from the source class into a latent code, which is then used to generate an image in the target class.
  • Figure 4: Visualizing the generalization and memorization of fine-tuned diffusion models on subsets of different sizes from the BBBC021 dataset. Each plot shows two histograms: the blue histogram represents the cosine similarity between images generated using the same seed by two fine-tuned models trained on distinct, non-overlapping subsets of the same size. If the model has achieved generalization, the blue histogram should be close to one, indicating that the two images generated by the models are very similar. The orange histogram represents the cosine similarity between a generated sample and its closest image from the training dataset. A well-generalized model would produce an orange histogram far from one, indicating that the generated images have low similarity to any specific training example.
  • Figure 5: The images generated by a diffusion model fine-tuned on 100 images using LoRA on different biological datasets, we can see that the generated samples resemble the real ones.
  • ...and 3 more figures