Table of Contents
Fetching ...

A Foundational Generative Model for Breast Ultrasound Image Analysis

Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, Liwei Wang

TL;DR

BUSGen introduces a diffusion-based foundational model trained on over $3{,}518{,}495$ breast ultrasound images and $3{,}749$ lesions to learn rich breast anatomy and pathology. Through few-shot adaptation and LoRA, it generates diverse, high-fidelity, privacy-protected data that enable downstream models (BUS-DMs) to outperform real-data baselines across screening, early diagnosis, and prognosis, including surpassing board-certified radiologists on DCIS detection. The work demonstrates data-scaling effects where generated data can match real data performance and highlights improvements in generalization and privacy through CPSampling and device-type augmentation. It also outlines a comprehensive methodology linking pretraining, adaptation, and diverse downstream tasks, providing a path toward scalable, privacy-conscious clinical AI in breast ultrasound and beyond.

Abstract

Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired extensive knowledge of breast structures, pathological features, and clinical variations. With few-shot adaptation, BUSGen can generate repositories of realistic and informative task-specific data, facilitating the development of models for a wide range of downstream tasks. Extensive experiments highlight BUSGen's exceptional adaptability, significantly exceeding real-data-trained foundational models in breast cancer screening, diagnosis, and prognosis. In breast cancer early diagnosis, our approach outperformed all board-certified radiologists (n=9), achieving an average sensitivity improvement of 16.5% (P-value<0.0001). Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models. Moreover, extensive experiments demonstrated that our approach improved the generalization ability of downstream models. Importantly, BUSGen protected patient privacy by enabling fully de-identified data sharing, making progress forward in secure medical data utilization. An online demo of BUSGen is available at https://aibus.bio.

A Foundational Generative Model for Breast Ultrasound Image Analysis

TL;DR

BUSGen introduces a diffusion-based foundational model trained on over breast ultrasound images and lesions to learn rich breast anatomy and pathology. Through few-shot adaptation and LoRA, it generates diverse, high-fidelity, privacy-protected data that enable downstream models (BUS-DMs) to outperform real-data baselines across screening, early diagnosis, and prognosis, including surpassing board-certified radiologists on DCIS detection. The work demonstrates data-scaling effects where generated data can match real data performance and highlights improvements in generalization and privacy through CPSampling and device-type augmentation. It also outlines a comprehensive methodology linking pretraining, adaptation, and diverse downstream tasks, providing a path toward scalable, privacy-conscious clinical AI in breast ultrasound and beyond.

Abstract

Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired extensive knowledge of breast structures, pathological features, and clinical variations. With few-shot adaptation, BUSGen can generate repositories of realistic and informative task-specific data, facilitating the development of models for a wide range of downstream tasks. Extensive experiments highlight BUSGen's exceptional adaptability, significantly exceeding real-data-trained foundational models in breast cancer screening, diagnosis, and prognosis. In breast cancer early diagnosis, our approach outperformed all board-certified radiologists (n=9), achieving an average sensitivity improvement of 16.5% (P-value<0.0001). Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models. Moreover, extensive experiments demonstrated that our approach improved the generalization ability of downstream models. Importantly, BUSGen protected patient privacy by enabling fully de-identified data sharing, making progress forward in secure medical data utilization. An online demo of BUSGen is available at https://aibus.bio.
Paper Structure (25 sections, 2 equations, 6 figures)

This paper contains 25 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: The schematic overview of the BUSGen pretraining and adaptation framework.a, Over 3.5 million breast ultrasound images of 5,907 examinations of 4,636 patients and 3,749 lesions were collected. These data were annotated by clinical experts and were used for the conditional generation task to pretrain the BUSGen model, enabling it to learn rich data distribution and generate high-quality images through an iterative refinement process repeated $T$ times. The pretraining task incorporated conditions of the labels of pathology, lesion box, and device type. b, The pretrained BUSGen can be adapted to various downstream tasks, generating unlimited, informative data resources and facilitating the development of downstream models. To preserve the rich information acquired during pretraining, we froze the pretrained parameters and fine-tuned low-rank adapters (LoRA). In comparison to baseline models, the BUSGen-based downstream models (BUS-DMs) achieved superior performance in a wide range of tasks across breast cancer screening, diagnosis and prognosis.
  • Figure 2: BUSGen can generate realistic data while protecting patient privacy.a, Results of Visual Turing Test. Three radiologists were asked to distinguish "fake" images generated by BUSGen from real images. They were presented with a set of 100 images, consisting of 50 generated and 50 real images. The results show that approximately 50%-75% of the generated images were mistakenly identified as real images by the radiologists. b, Distribution of cosine similarity scores in the feature space between generated samples and their nearest neighbor in the training data. This result indicates that BUSGen will not replicate its training data, as the highest cosine similarity is 0.896. c, Visualization of two pairs of generated and real images with the highest cosine similarity scores. As shown in the plots, these image pairs are not exact replications.
  • Figure 3: BUSGen improves breast cancer screening tasks.a, Distribution of lesion sizes. The smallest 30% lesions (orange) are defined as lesions with areas smaller than 0.068 (relative to images), which can be hard for radiologists to detect during ultrasound scanning. b, Comparison of BUS-DM (red) and Baseline (light blue) in small lesion detection (n=16,896). We report Average Precision (AP) at an Intersection over Union (IoU) threshold of 0.5. BUS-DM achieved an AP$_{\text{small}}$ of 0.702 (95% CI 0.681$-$0.720) and outperformed Baseline (blue; AP$_{\text{small}}$: 0.657; 95% 0.637$-$0.679; P-value=0.0017). c, Comparison of BUS-DM (red) and Baseline (light blue) in lesion detection (n=28,150). BUS-DM achieved an AP of 0.934 (95% CI 0.930$-$0.938) and significantly outperformed Baseline (blue; AP: 0.912; 95% 0.907$-$0.917; P-value$<$0.0001). d, Illustration of opportunistic screening and classification tasks. The opportunistic screening is performed on a population without suspicious symptoms of breast cancer. The suspicious breast cancer symptoms include palpable mass, nipple discharge, severe pain and skin change (upper). Using deep learning models, we predict whether opportunistic screening-detected lesions are benign or malignant (lower). e, Comparison of BUS-DM (red) with Baseline-CLIP (blue) for benign-malignant classification of opportunistic screening-detected lesions. BUS-DM achieved a higher AUC of 0.913 (95% CI: 0.874–0.948) compared to Baseline-CLIP with an AUC of 0.870 (95% CI: 0.823–0.912; P-value=0.0074). **P-value$<$ 0.01; ***P-value$<$ 0.001; ****P-value$<$0.0001.
  • Figure 4: BUSGen enhances breast ultrasound diagnosis.a, Breast cancer early diagnosis involved the identification of DCIS (early-stage cancer) from benign lesions, which was considered difficult for radiologists based on ultrasound images. b, Comparison of BUS-DM (red) with Baseline-CLIP (blue) in the early diagnosis task for benign-DCIS classification. BUS-DM achieved a higher AUC of 0.900 (95% CI: 0.860–0.939) compared to the Baseline-CLIP with an AUC of 0.846 (95% CI: 0.787–0.902; P-value=0.0002). c, Comparison of BUS-DM with board-certified radiologists (n=9; 11 years of experience on average) in breast cancer early diagnosis. The ROC curves of BUS-DM (red curve) and diagnostic results of radiologists (dots) show that BUS-DM outperformed radiologists by a large margin. The colors (blue, green, and orange) of the dots represent radiologists' results calculated via different thresholds. d, Accuracy improvements of radiologists with the assistance of BUS-DM. We report the accuracy of radiologists in breast cancer early diagnosis, as well as their accuracy after considering BUS-DM predictions. Accuracy is calculated using the threshold of BI-RADS 4A. e, The data scaling curves of test loss (upper part of the left plot) and AUC score (lower part of the left plot) of diagnostic models trained on different scales of real collected data (dark purple) and BUSGen generated data (light purple). The curves for real and generated data closely align at small data scales, with the generated data continuously enhancing downstream performance as the number of generated samples increases. By scaling up the generated data to 1 million samples, we developed BUS-DM (AUC: 0.929; 95% CI 0.907-0.950) that achieved comparable performance to NYU-AI (trained on 288,767 real samples; AUC: 0.927; 95% CI 0.907-0.959), and outperformed Baseline-CLIP (AUC: 0.876; 95% CI 0.849$-$0.903; P-value=0.0006) on the BUSI test set (n=780). f, Comparison of BUS-DM (red) with Baseline-CLIP (blue) on the internal diagnosis test set for benign-malignant classification (n=579). BUS-DM achieved a higher AUC of 0.953 (95% CI: 0.935–0.967) compared to the Baseline-CLIP with an AUC of 0.925 (95% CI: 0.900–0.947; P-value=0.0006). g, Comparison of BUS-DM (red) with Baseline-CLIP (blue) on the external diagnosis test set for benign-malignant classification (n=227). BUS-DM achieved a higher AUC of 0.951 (95% CI: 0.921–0.975) compared to the Baseline-CLIP with an AUC of 0.913 (95% CI: 0.876–0.950; P-value=0.0007). Note that BUS-DM, trained only on generated data, enjoyed better generalization ability than baseline models trained on real data. h, Comparison of BUS-DM with board-certified radiologists (n=9) of the diagnosis task (benign-malignant classification) on the external test set. The ROC curves of BUS-DM (red curve) and diagnostic results of radiologists (dots) show that BUS-DM outperformed the average performance of radiologists. ***P-value$<$0.001.
  • Figure 5: BUSGen facilitates breast cancer prognosis.a, Comparison of BUS-DM (red) with Baseline-CLIP (blue) in molecular subtype classification (TNBC vs. non-TNBC). BUS-DM achieved a higher AUC of 0.954 (95% CI: 0.932–0.983) compared to the Baseline-CLIP with an AUC of 0.723 (95% CI 0.648$-$0.795; P-value=0.0046). b, Comparison of BUS-DM (red) with Baseline-CLIP (blue) in ALN metastasis status classification (ALN-negative vs. ALN-positive). BUS-DM achieved a higher AUC of 0.895 (95% CI: 0.841–0.960) compared to the Baseline-CLIP with an AUC of 0.807 (95% CI 0.723$-$0.890; P-value=0.0118). c, t-SNE plots of classification features (referred to as [CLS]) of downstream models in molecular subtype prediction. Clusters of TNBC and non-TNBC of BUS-DM [CLS] features are more concentrated than [CLS] features of Baseline-CLIP. d, t-SNE plots of [CLS] features in the ALN status classification. Clusters of ALN-negative and ALN-positive BUS-DM [CLS] features are more concentrated than [CLS] features of Baseline-CLIP. e, Saliency map of molecular subtype prediction by BUS-DM. The upper-left part of the lesion margin is highlighted by BUS-DM in predicting TNBC. f, Saliency map of ALN status prediction by BUS-DM, which pays more attention to the surrounding glandular tissues of the lesion for predicting ALN metastasis. **P-value $<$ 0.01.
  • ...and 1 more figures