Table of Contents
Fetching ...

MieDB-100k: A Comprehensive Dataset for Medical Image Editing

Yongfan Lai, Wen Qian, Bo Liu, Hongyan Li, Hao Luo, Fan Wang, Bohan Zhuang, Shenda Hong

TL;DR

MieDB-100k tackles the critical data bottleneck in medical image editing by delivering a large-scale, diverse dataset that unifies perception, modification, and transformation tasks under a text-guided editing paradigm. It introduces a data curation pipeline combining modality-specific expert models with rule-based synthesis and targeted manual checks to ensure clinical fidelity. The work defines a triplet structure ($I$, $P$, $O$) and four-stage workflows for modification, along with 17 transformation targets and post-processing to diversify prompts. Evaluations using verifiable metrics and VLM-based rubrics demonstrate that models fine-tuned on MieDB-100k, notably OmniGen2-MIE, achieve substantial improvements over open- and closed-source baselines and generalize to unseen tasks. Overall, MieDB-100k provides a robust foundation for advancing unified, clinically reliable medical image understanding and editing in multimodal models.

Abstract

The scarcity of high-quality data remains a primary bottleneck in adapting multimodal generative models for medical image editing. Existing medical image editing datasets often suffer from limited diversity, neglect of medical image understanding and inability to balance quality with scalability. To address these gaps, we propose MieDB-100k, a large-scale, high-quality and diverse dataset for text-guided medical image editing. It categorizes editing tasks into perspectives of Perception, Modification and Transformation, considering both understanding and generation abilities. We construct MieDB-100k via a data curation pipeline leveraging both modality-specific expert models and rule-based data synthetic methods, followed by rigorous manual inspection to ensure clinical fidelity. Extensive experiments demonstrate that model trained with MieDB-100k consistently outperform both open-source and proprietary models while exhibiting strong generalization ability. We anticipate that this dataset will serve as a cornerstone for future advancements in specialized medical image editing.

MieDB-100k: A Comprehensive Dataset for Medical Image Editing

TL;DR

MieDB-100k tackles the critical data bottleneck in medical image editing by delivering a large-scale, diverse dataset that unifies perception, modification, and transformation tasks under a text-guided editing paradigm. It introduces a data curation pipeline combining modality-specific expert models with rule-based synthesis and targeted manual checks to ensure clinical fidelity. The work defines a triplet structure (, , ) and four-stage workflows for modification, along with 17 transformation targets and post-processing to diversify prompts. Evaluations using verifiable metrics and VLM-based rubrics demonstrate that models fine-tuned on MieDB-100k, notably OmniGen2-MIE, achieve substantial improvements over open- and closed-source baselines and generalize to unseen tasks. Overall, MieDB-100k provides a robust foundation for advancing unified, clinically reliable medical image understanding and editing in multimodal models.

Abstract

The scarcity of high-quality data remains a primary bottleneck in adapting multimodal generative models for medical image editing. Existing medical image editing datasets often suffer from limited diversity, neglect of medical image understanding and inability to balance quality with scalability. To address these gaps, we propose MieDB-100k, a large-scale, high-quality and diverse dataset for text-guided medical image editing. It categorizes editing tasks into perspectives of Perception, Modification and Transformation, considering both understanding and generation abilities. We construct MieDB-100k via a data curation pipeline leveraging both modality-specific expert models and rule-based data synthetic methods, followed by rigorous manual inspection to ensure clinical fidelity. Extensive experiments demonstrate that model trained with MieDB-100k consistently outperform both open-source and proprietary models while exhibiting strong generalization ability. We anticipate that this dataset will serve as a cornerstone for future advancements in specialized medical image editing.
Paper Structure (40 sections, 3 equations, 15 figures, 6 tables)

This paper contains 40 sections, 3 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: overview. It categorizes medical image editing tasks into three perspectives, covering diverse medical modalities.
  • Figure 2: Modality distribution (a) and prompt word cloud (b).
  • Figure 3: Construction pipeline of MieDB-100k.
  • Figure 4: Qualitative editing result comparison.
  • Figure 5: Generalization test assessment. (a) and (b): Edit samples output by different models on bone metastasis addition (a) and removal (b) tasks. Red bounding boxes are added post-hoc to highlight the edited regions for visualization; (c): Quantitative assessments following the recipe of Modification task evaluation.
  • ...and 10 more figures