BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis
Xiaomin Li, Mykhailo Sakevych, Gentry Atkinson, Vangelis Metsis
TL;DR
BioDiffusion addresses data scarcity, class imbalance, labeling complexity, and measurement noise in biomedical signals by introducing a diffusion-based probabilistic model capable of unconditional, label-conditioned, and signal-conditioned generation. The method adapts a U-Net backbone for time-series data and demonstrates high-fidelity synthesis across multiple datasets, outperforming leading time-series generative models in quality metrics and qualitative evaluations. It further showcases practical utilities in denoising, imputation, upsampling, and subject-specific data generation, underscoring its potential to enhance ML performance in diagnostics and patient monitoring. Overall, BioDiffusion represents a versatile, scalable approach that can augment biomedical signal analysis and accelerate clinical insights through improved data availability and quality.
Abstract
Machine learning tasks involving biomedical signals frequently grapple with issues such as limited data availability, imbalanced datasets, labeling complexities, and the interference of measurement noise. These challenges often hinder the optimal training of machine learning algorithms. Addressing these concerns, we introduce BioDiffusion, a diffusion-based probabilistic model optimized for the synthesis of multivariate biomedical signals. BioDiffusion demonstrates excellence in producing high-fidelity, non-stationary, multivariate signals for a range of tasks including unconditional, label-conditional, and signal-conditional generation. Leveraging these synthesized signals offers a notable solution to the aforementioned challenges. Our research encompasses both qualitative and quantitative assessments of the synthesized data quality, underscoring its capacity to bolster accuracy in machine learning tasks tied to biomedical signals. Furthermore, when juxtaposed with current leading time-series generative models, empirical evidence suggests that BioDiffusion outperforms them in biomedical signal generation quality.
