Data Augmentation with Diffusion Models for Colon Polyp Localization on the Low Data Regime: How much real data is enough?
Adrian Tormos, Blanca Llauradó, Fernando Núñez, Axel Romero, Dario Garcia-Gasulla, Javier Béjar
TL;DR
This work tackles data scarcity in colon polyp localization by training a Latent Diffusion Model on heterogeneous colonoscopy datasets to generate annotated training samples at a target resolution of $640\\times 640$, using $80\\times 80$ latent Gaussian noise and $1000$ denoising steps. The generated data include both images and localization masks, enabling joint synthesis and downstream localization training; a YOLOv9-based detector is pretrained on synthetic data and fine-tuned with limited real data. Results show substantial gains in low-data settings (e.g., with as few as $50$–$100$ real images) across multiple evaluation datasets, with improvements up to $2$–$4$× in some cases, while gains taper as real data increases. The findings suggest that diverse synthetic data from multiple heterogeneous datasets can robustly support cross-institutional polyp localization, and point to future work on dataset-aligned fine-tuning (e.g., LoRA) to further reduce data requirements.
Abstract
The scarcity of data in medical domains hinders the performance of Deep Learning models. Data augmentation techniques can alleviate that problem, but they usually rely on functional transformations of the data that do not guarantee to preserve the original tasks. To approximate the distribution of the data using generative models is a way of reducing that problem and also to obtain new samples that resemble the original data. Denoising Diffusion models is a promising Deep Learning technique that can learn good approximations of different kinds of data like images, time series or tabular data. Automatic colonoscopy analysis and specifically Polyp localization in colonoscopy videos is a task that can assist clinical diagnosis and treatment. The annotation of video frames for training a deep learning model is a time consuming task and usually only small datasets can be obtained. The fine tuning of application models using a large dataset of generated data could be an alternative to improve their performance. We conduct a set of experiments training different diffusion models that can generate jointly colonoscopy images with localization annotations using a combination of existing open datasets. The generated data is used on various transfer learning experiments in the task of polyp localization with a model based on YOLO v9 on the low data regime.
