Table of Contents
Fetching ...

BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis

Xiaomin Li, Mykhailo Sakevych, Gentry Atkinson, Vangelis Metsis

TL;DR

BioDiffusion addresses data scarcity, class imbalance, labeling complexity, and measurement noise in biomedical signals by introducing a diffusion-based probabilistic model capable of unconditional, label-conditioned, and signal-conditioned generation. The method adapts a U-Net backbone for time-series data and demonstrates high-fidelity synthesis across multiple datasets, outperforming leading time-series generative models in quality metrics and qualitative evaluations. It further showcases practical utilities in denoising, imputation, upsampling, and subject-specific data generation, underscoring its potential to enhance ML performance in diagnostics and patient monitoring. Overall, BioDiffusion represents a versatile, scalable approach that can augment biomedical signal analysis and accelerate clinical insights through improved data availability and quality.

Abstract

Machine learning tasks involving biomedical signals frequently grapple with issues such as limited data availability, imbalanced datasets, labeling complexities, and the interference of measurement noise. These challenges often hinder the optimal training of machine learning algorithms. Addressing these concerns, we introduce BioDiffusion, a diffusion-based probabilistic model optimized for the synthesis of multivariate biomedical signals. BioDiffusion demonstrates excellence in producing high-fidelity, non-stationary, multivariate signals for a range of tasks including unconditional, label-conditional, and signal-conditional generation. Leveraging these synthesized signals offers a notable solution to the aforementioned challenges. Our research encompasses both qualitative and quantitative assessments of the synthesized data quality, underscoring its capacity to bolster accuracy in machine learning tasks tied to biomedical signals. Furthermore, when juxtaposed with current leading time-series generative models, empirical evidence suggests that BioDiffusion outperforms them in biomedical signal generation quality.

BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis

TL;DR

BioDiffusion addresses data scarcity, class imbalance, labeling complexity, and measurement noise in biomedical signals by introducing a diffusion-based probabilistic model capable of unconditional, label-conditioned, and signal-conditioned generation. The method adapts a U-Net backbone for time-series data and demonstrates high-fidelity synthesis across multiple datasets, outperforming leading time-series generative models in quality metrics and qualitative evaluations. It further showcases practical utilities in denoising, imputation, upsampling, and subject-specific data generation, underscoring its potential to enhance ML performance in diagnostics and patient monitoring. Overall, BioDiffusion represents a versatile, scalable approach that can augment biomedical signal analysis and accelerate clinical insights through improved data availability and quality.

Abstract

Machine learning tasks involving biomedical signals frequently grapple with issues such as limited data availability, imbalanced datasets, labeling complexities, and the interference of measurement noise. These challenges often hinder the optimal training of machine learning algorithms. Addressing these concerns, we introduce BioDiffusion, a diffusion-based probabilistic model optimized for the synthesis of multivariate biomedical signals. BioDiffusion demonstrates excellence in producing high-fidelity, non-stationary, multivariate signals for a range of tasks including unconditional, label-conditional, and signal-conditional generation. Leveraging these synthesized signals offers a notable solution to the aforementioned challenges. Our research encompasses both qualitative and quantitative assessments of the synthesized data quality, underscoring its capacity to bolster accuracy in machine learning tasks tied to biomedical signals. Furthermore, when juxtaposed with current leading time-series generative models, empirical evidence suggests that BioDiffusion outperforms them in biomedical signal generation quality.
Paper Structure (25 sections, 5 equations, 16 figures, 3 tables)

This paper contains 25 sections, 5 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Unconditional Diffusion process
  • Figure 2: Label Conditional Diffusion process
  • Figure 3: Signal Conditional Diffusion process
  • Figure 4: Description of the U-Net architecture for signals with skip connections
  • Figure 5: Raw signals comparison. The left column shows real raw signals. The right column shows synthetic raw signals.
  • ...and 11 more figures