Table of Contents
Fetching ...

MorphGen: Controllable and Morphologically Plausible Generative Cell-Imaging

Berker Demirel, Marco Fumero, Theofanis Karaletsos, Francesco Locatello

TL;DR

MorphGen addresses the challenge of generating high-resolution, biologically faithful six-channel Cell Painting images under perturbations across multiple cell types. It combines a latent-diffusion framework in SD-VAE latent space with organelle-aware channel processing and a REPA-style alignment to an OpenPhenom foundation model, enabling controllable synthesis and faithful morphology. Quantitative results show substantially improved FID/KID and preserved perturbation- and cell-type-specific signals, evidenced by downstream feature analyses (CellProfiler and OpenPhenom) and CATE alignment. The approach advances virtual instruments for high-content screening, offering scalable, multi-condition image generation suitable for in silico screening and data augmentation, while noting limitations in extrapolating to entirely novel conditions and suggesting future instance-based conditioning.

Abstract

Simulating in silico cellular responses to interventions is a promising direction to accelerate high-content image-based assays, critical for advancing drug discovery and gene editing. To support this, we introduce MorphGen, a state-of-the-art diffusion-based generative model for fluorescent microscopy that enables controllable generation across multiple cell types and perturbations. To capture biologically meaningful patterns consistent with known cellular morphologies, MorphGen is trained with an alignment loss to match its representations to the phenotypic embeddings of OpenPhenom, a state-of-the-art biological foundation model. Unlike prior approaches that compress multichannel stains into RGB images -- thus sacrificing organelle-specific detail -- MorphGen generates the complete set of fluorescent channels jointly, preserving per-organelle structures and enabling a fine-grained morphological analysis that is essential for biological interpretation. We demonstrate biological consistency with real images via CellProfiler features, and MorphGen attains an FID score over 35% lower than the prior state-of-the-art MorphoDiff, which only generates RGB images for a single cell type. Code is available at https://github.com/czi-ai/MorphGen.

MorphGen: Controllable and Morphologically Plausible Generative Cell-Imaging

TL;DR

MorphGen addresses the challenge of generating high-resolution, biologically faithful six-channel Cell Painting images under perturbations across multiple cell types. It combines a latent-diffusion framework in SD-VAE latent space with organelle-aware channel processing and a REPA-style alignment to an OpenPhenom foundation model, enabling controllable synthesis and faithful morphology. Quantitative results show substantially improved FID/KID and preserved perturbation- and cell-type-specific signals, evidenced by downstream feature analyses (CellProfiler and OpenPhenom) and CATE alignment. The approach advances virtual instruments for high-content screening, offering scalable, multi-condition image generation suitable for in silico screening and data augmentation, while noting limitations in extrapolating to entirely novel conditions and suggesting future instance-based conditioning.

Abstract

Simulating in silico cellular responses to interventions is a promising direction to accelerate high-content image-based assays, critical for advancing drug discovery and gene editing. To support this, we introduce MorphGen, a state-of-the-art diffusion-based generative model for fluorescent microscopy that enables controllable generation across multiple cell types and perturbations. To capture biologically meaningful patterns consistent with known cellular morphologies, MorphGen is trained with an alignment loss to match its representations to the phenotypic embeddings of OpenPhenom, a state-of-the-art biological foundation model. Unlike prior approaches that compress multichannel stains into RGB images -- thus sacrificing organelle-specific detail -- MorphGen generates the complete set of fluorescent channels jointly, preserving per-organelle structures and enabling a fine-grained morphological analysis that is essential for biological interpretation. We demonstrate biological consistency with real images via CellProfiler features, and MorphGen attains an FID score over 35% lower than the prior state-of-the-art MorphoDiff, which only generates RGB images for a single cell type. Code is available at https://github.com/czi-ai/MorphGen.

Paper Structure

This paper contains 36 sections, 5 equations, 16 figures, 19 tables.

Figures (16)

  • Figure 1: Original (top row) and generated (bottom row) images for various cell type / perturbation ID pairs from the RxRx1 dataset rxrx1sypetkowski2023. Unlike existing models, our MorphGen is capable of generating crisp, high-dimensional images across different cell-types and perturbations. Generated images are not cherry-picked, and we selected original images that are neighbors of the generated ones for visualization. See Appendix \ref{['app:qualitative']} for additional examples.
  • Figure 2: Comparison of original and generated fluorescence images for each organelle in a control HEPG2 cell. Our model reconstructs the six distinct fluorescent channels using RxRx1-recommended colormaps, preserving morphology across subcellular structures. Generated images are not cherry-picked, and we selected original images that are neighbors of the generated ones for visualization.
  • Figure 3: PCA of CellProfiler features. Color denotes perturbation (1108, 1124, 1137, 1138); marker style denotes data type (circle: real, cross: generated). Generated samples align with real clusters while maintaining perturbation separation.
  • Figure 4: CellProfiler morphology analysis (HUVEC). Correlation matrices for the top-10 PCA-selected features in real and generated data, shown side-by-side with a shared scale, indicate that MorphGen preserves key morphological relationships.
  • Figure 5: PCA projections of OpenPhenom features from real and generated images. The left panel shows the joint distribution of most frequent perturbations (including the control, p1138) for HUVEC cells, with points colored by perturbation. The right panel visualizes the perturbation 1108 across different cell types. In both panels, marker shapes indicate whether the sample is real or generated.
  • ...and 11 more figures