Table of Contents
Fetching ...

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

Zhihui Chen, Mengling Feng

TL;DR

Med-Banana-50K provides a cross-modality, bidirectional medical image editing dataset built from real clinical sources to support text-guided edits. It combines a systematic instruction generation pipeline, state-of-the-art editing models, and an LLM-based judge with a multi-objective rubric, iterating up to five refinement rounds to ensure anatomical fidelity and imaging realism. The dataset contains 50,635 successful edits and 37,822 failed attempts across chest X-ray, brain MRI, and fundus images for 23 diseases, enabling supervised fine-tuning and preference learning. By releasing detailed metadata, conversations, and failure logs under open licenses, the work aims to advance reliable medical image editing, counterfactual analysis, and alignment research while acknowledging limitations in coverage and the need for expert validation.

Abstract

Medical image editing has emerged as a pivotal technology with broad applications in data augmentation, model interpretability, medical education, and treatment simulation. However, the lack of large-scale, high-quality, and openly accessible datasets tailored for medical contexts with strict anatomical and clinical constraints has significantly hindered progress in this domain. To bridge this gap, we introduce Med-Banana-50K, a comprehensive dataset of over 50k medically curated image edits spanning chest X-ray, brain MRI, and fundus photography across 23 diseases. Each sample supports bidirectional lesion editing (addition and removal) and is constructed using Gemini-2.5-Flash-Image based on real clinical images. A key differentiator of our dataset is the medically grounded quality control protocol: we employ an LLM-as-Judge evaluation framework with criteria such as instruction compliance, structural plausibility, image realism, and fidelity preservation, alongside iterative refinement over up to five rounds. Additionally, Med-Banana-50K includes around 37,000 failed editing attempts with full evaluation logs to support preference learning and alignment research. By offering a large-scale, medically rigorous, and fully documented resource, Med-Banana-50K establishes a critical foundation for developing and evaluating reliable medical image editing systems. Our dataset and code are publicly available. [https://github.com/richardChenzhihui/med-banana-50k].

Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing

TL;DR

Med-Banana-50K provides a cross-modality, bidirectional medical image editing dataset built from real clinical sources to support text-guided edits. It combines a systematic instruction generation pipeline, state-of-the-art editing models, and an LLM-based judge with a multi-objective rubric, iterating up to five refinement rounds to ensure anatomical fidelity and imaging realism. The dataset contains 50,635 successful edits and 37,822 failed attempts across chest X-ray, brain MRI, and fundus images for 23 diseases, enabling supervised fine-tuning and preference learning. By releasing detailed metadata, conversations, and failure logs under open licenses, the work aims to advance reliable medical image editing, counterfactual analysis, and alignment research while acknowledging limitations in coverage and the need for expert validation.

Abstract

Medical image editing has emerged as a pivotal technology with broad applications in data augmentation, model interpretability, medical education, and treatment simulation. However, the lack of large-scale, high-quality, and openly accessible datasets tailored for medical contexts with strict anatomical and clinical constraints has significantly hindered progress in this domain. To bridge this gap, we introduce Med-Banana-50K, a comprehensive dataset of over 50k medically curated image edits spanning chest X-ray, brain MRI, and fundus photography across 23 diseases. Each sample supports bidirectional lesion editing (addition and removal) and is constructed using Gemini-2.5-Flash-Image based on real clinical images. A key differentiator of our dataset is the medically grounded quality control protocol: we employ an LLM-as-Judge evaluation framework with criteria such as instruction compliance, structural plausibility, image realism, and fidelity preservation, alongside iterative refinement over up to five rounds. Additionally, Med-Banana-50K includes around 37,000 failed editing attempts with full evaluation logs to support preference learning and alignment research. By offering a large-scale, medically rigorous, and fully documented resource, Med-Banana-50K establishes a critical foundation for developing and evaluating reliable medical image editing systems. Our dataset and code are publicly available. [https://github.com/richardChenzhihui/med-banana-50k].

Paper Structure

This paper contains 19 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Pipeline: instruction generation, single-step editing, LLM-as-Judge evaluation, and history-aware refinement under fidelity, negative rules, and minimal change
  • Figure 2: Representative edited results across modalities and tasks (add/remove).
  • Figure 3: Top: dataset composition (success counts) by modality and task. Middle: rounds-to-success histogram. Bottom: judge qualified rates (weighted).