Table of Contents
Fetching ...

Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields

Rüveyda Yilmaz, Dennis Eschweiler, Johannes Stegmaier

TL;DR

BVDM outperforms state-of-the-art synthetic live cell microscopy video generation models and it is demonstrated that a sufficiently large synthetic dataset enhances the performance of cell segmentation and tracking models compared to using a limited amount of available real data.

Abstract

The segmentation and tracking of living cells play a vital role within the biomedical domain, particularly in cancer research, drug development, and developmental biology. These are usually tedious and time-consuming tasks that are traditionally done by biomedical experts. Recently, to automatize these processes, deep learning based segmentation and tracking methods have been proposed. These methods require large-scale datasets and their full potential is constrained by the scarcity of annotated data in the biomedical imaging domain. To address this limitation, we propose Biomedical Video Diffusion Model (BVDM), capable of generating realistic-looking synthetic microscopy videos. Trained only on a single real video, BVDM can generate videos of arbitrary length with pixel-level annotations that can be used for training data-hungry models. It is composed of a denoising diffusion probabilistic model (DDPM) generating high-fidelity synthetic cell microscopy images and a flow prediction model (FPM) predicting the non-rigid transformation between consecutive video frames. During inference, initially, the DDPM imposes realistic cell textures on synthetic cell masks which are generated based on real data statistics. The flow prediction model predicts the flow field between consecutive masks and applies that to the DDPM output from the previous time frame to create the next one while keeping temporal consistency. BVDM outperforms state-of-the-art synthetic live cell microscopy video generation models. Furthermore, we demonstrate that a sufficiently large synthetic dataset enhances the performance of cell segmentation and tracking models compared to using a limited amount of available real data.

Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields

TL;DR

BVDM outperforms state-of-the-art synthetic live cell microscopy video generation models and it is demonstrated that a sufficiently large synthetic dataset enhances the performance of cell segmentation and tracking models compared to using a limited amount of available real data.

Abstract

The segmentation and tracking of living cells play a vital role within the biomedical domain, particularly in cancer research, drug development, and developmental biology. These are usually tedious and time-consuming tasks that are traditionally done by biomedical experts. Recently, to automatize these processes, deep learning based segmentation and tracking methods have been proposed. These methods require large-scale datasets and their full potential is constrained by the scarcity of annotated data in the biomedical imaging domain. To address this limitation, we propose Biomedical Video Diffusion Model (BVDM), capable of generating realistic-looking synthetic microscopy videos. Trained only on a single real video, BVDM can generate videos of arbitrary length with pixel-level annotations that can be used for training data-hungry models. It is composed of a denoising diffusion probabilistic model (DDPM) generating high-fidelity synthetic cell microscopy images and a flow prediction model (FPM) predicting the non-rigid transformation between consecutive video frames. During inference, initially, the DDPM imposes realistic cell textures on synthetic cell masks which are generated based on real data statistics. The flow prediction model predicts the flow field between consecutive masks and applies that to the DDPM output from the previous time frame to create the next one while keeping temporal consistency. BVDM outperforms state-of-the-art synthetic live cell microscopy video generation models. Furthermore, we demonstrate that a sufficiently large synthetic dataset enhances the performance of cell segmentation and tracking models compared to using a limited amount of available real data.
Paper Structure (4 sections, 4 figures, 2 tables)

This paper contains 4 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of BVDM in training (a) and inference (b.1, b.2). (a) The DDPM and VoxelMorph are trained independently on random images $I_f$ and consecutive masks $M_f$ and $M_{f+1}$ respectively. (b.1) During inference, the DDPM generates the texture for the first appearance of each cell. (b.2) For the other frames, flow field prediction from VoxelMorph is applied to the output from the previous iteration, and the result is fed to the DDPM.
  • Figure 2: Qualitative comparisons of the texture consistency across frames from the real dataset mavska2014benchmark (a), BVDM (b), MitoGen svoboda2016mitogen (c), and CellCycleGAN bahr2021cellcyclegan (d).
  • Figure 3: Sample masks from four consecutive frames (a) and qualitative results for $T_{f=0}=100$ (b), $T_{f=0}=200$ (c), $T_{f=0}=400$ (d), and $T_{f=0}=600$ (e) when $T_{f\neq 0}=10$ as a demonstration of the mismatch between the mask annotations and the generated images depending on the diffusion time step $T_{f=0}$ taken.
  • Figure 4: Sample masks from four consecutive frames (a) and qualitative results for $T_{f\neq 0}=0$ (b), $T_{f\neq 0}=10$ (c), $T_{f\neq 0}=30$ (d), $T_{f\neq 0}=50$ (e), and $T_{f\neq 0}=200$ (f) when $T_{f=0}=200$ as a demonstration of the mismatch between the mask annotations and the generated images, and the texture consistency depending on the diffusion time step $T_{f\neq 0}$. For (f), the flow field is not predicted between consecutive frames; instead, each frame is independently generated with 200 diffusion time steps.