Table of Contents
Fetching ...

Elucidating the Design Space of Flow Matching for Cellular Microscopy

Charles Jones, Emmanuel Noutahi, Jason Hartford, Cian Eastwood

Abstract

Flow-matching generative models are increasingly used to simulate cell responses to biological perturbations. However, the design space for building such models is large and underexplored. We systematically analyse the design space of flow matching models for cell-microscopy images, finding that many popular techniques are unnecessary and can even hurt performance. We develop a simple, stable, and scalable recipe which we use to train our foundation model. We scale our model to two orders of magnitude larger than prior methods, achieving a two-fold FID and ten-fold KID improvement over prior methods. We then fine-tune our model with pre-trained molecular embeddings to achieve state-of-the-art performance simulating responses to unseen molecules. Code is available at https://github.com/valence-labs/microscopy-flow-matching

Elucidating the Design Space of Flow Matching for Cellular Microscopy

Abstract

Flow-matching generative models are increasingly used to simulate cell responses to biological perturbations. However, the design space for building such models is large and underexplored. We systematically analyse the design space of flow matching models for cell-microscopy images, finding that many popular techniques are unnecessary and can even hurt performance. We develop a simple, stable, and scalable recipe which we use to train our foundation model. We scale our model to two orders of magnitude larger than prior methods, achieving a two-fold FID and ten-fold KID improvement over prior methods. We then fine-tune our model with pre-trained molecular embeddings to achieve state-of-the-art performance simulating responses to unseen molecules. Code is available at https://github.com/valence-labs/microscopy-flow-matching

Paper Structure

This paper contains 57 sections, 1 theorem, 6 equations, 10 figures, 7 tables.

Key Result

Proposition 3.1

Under Assumptions 1--3, for every $x\in\mathcal{X}$, Consequently,

Figures (10)

  • Figure 1: We ablate, simplify, and scale flow matching for generative modelling of cellular microscopy, achieving state-of-the-art performance on: (top images) the RxRx1 genetic intervention benchmark; (middle images) the BBBC021 small molecule perturbation benchmark; and (bottom images) the BBBC021 unseen molecule virtual screening task. In the bottom plot, we display performance vs. total training compute on the RxRx1 benchmark.
  • Figure 2: Samples from our best performing model on RxRx1. Each column represents a unique gene perturbation and each row represents a different biological experiment. The model generates distinct phenotypes for each perturbation whilst capturing biological batch effects. See \ref{['fig:rxrx1_samples_extra']} for side-by-side comparisons of real and generated images.
  • Figure 3: Conditioning strategy. While (pre)training on seen perturbations (a), we use learnable linear embeddings of the perturbation label. When finetuning for unseen perturbations in \ref{['sec:simulation']} (b), we freeze the base model and perturbation encoder. We train a small adaptor to align the embeddings to the model's conditioning.
  • Figure 4: Generation vs. counterfactual mode sampling. Top row represents the flow trajectory for generation mode, starting with Gaussian noise and ending at a perturbed image. Bottom row represents the trajectory for counterfactual mode. This involves starting with a control image, solving the reverse probability flow ODE to compute its corresponding noise, then generating a perturbed image from the inferred noise in the usual way.
  • Figure 5: Counterfactual sampling on RxRx1. Top row represents real untreated control images. Middle row is the predicted counterfactual for the relevant perturbations (as labelled by the column headers). Bottom row represents the predicted individual treatment effect (difference of counterfactual to factual, averaged over channels). Our model is able to transform cell morphology whilst preserving features not relevant to the perturbation.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • proof