Table of Contents
Fetching ...

Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

TL;DR

TAILOR addresses the long-tail problem in breast ultrasound diagnosis by combining a knowledge-driven diffusion generator (TAILOR-Gen) with an interpretable ensemble classifier (TAILOR-Diag) that synthesizes tailored data conditioned on pathology and domain knowledge. By incorporating basic (lesion area, device type) and pathology-specific (NCM, CAL, DCIS) cues, TAILOR-Gen produces diverse, well-balanced images that train a robust diagnostic model. In external evaluation, TAILOR-Diag achieves an AUC of 0.954, outperforms a real-data baseline (AUC 0.909), and increases radiologists’ specificity by 33.5% at the same sensitivity, with DCIS detection showing striking gains. The approach demonstrates strong generalization and interpretability, suggesting potential extensions to other diseases and imaging modalities.

Abstract

Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifically, we introduce a pipeline, TAILOR, that builds a knowledge-driven generative model to produce tailored synthetic data. The generative model, using 3,749 lesions as source data, can generate millions of breast-US images, especially for error-prone rare cases. The generated data can be further used to build a diagnostic model for accurate and interpretable diagnoses. In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33.5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process. Moreover, on ductal carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by a large margin, with only 34 DCIS lesions in the source data. We believe that TAILOR can potentially be extended to various diseases and imaging modalities.

Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

TL;DR

TAILOR addresses the long-tail problem in breast ultrasound diagnosis by combining a knowledge-driven diffusion generator (TAILOR-Gen) with an interpretable ensemble classifier (TAILOR-Diag) that synthesizes tailored data conditioned on pathology and domain knowledge. By incorporating basic (lesion area, device type) and pathology-specific (NCM, CAL, DCIS) cues, TAILOR-Gen produces diverse, well-balanced images that train a robust diagnostic model. In external evaluation, TAILOR-Diag achieves an AUC of 0.954, outperforms a real-data baseline (AUC 0.909), and increases radiologists’ specificity by 33.5% at the same sensitivity, with DCIS detection showing striking gains. The approach demonstrates strong generalization and interpretability, suggesting potential extensions to other diseases and imaging modalities.

Abstract

Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifically, we introduce a pipeline, TAILOR, that builds a knowledge-driven generative model to produce tailored synthetic data. The generative model, using 3,749 lesions as source data, can generate millions of breast-US images, especially for error-prone rare cases. The generated data can be further used to build a diagnostic model for accurate and interpretable diagnoses. In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33.5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process. Moreover, on ductal carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by a large margin, with only 34 DCIS lesions in the source data. We believe that TAILOR can potentially be extended to various diseases and imaging modalities.
Paper Structure (17 sections, 4 equations, 6 figures)

This paper contains 17 sections, 4 equations, 6 figures.

Figures (6)

  • Figure 1: The challenge of long-tail distribution. The distribution of pathological subtypes is long-tailed in our training set which has 1,387 biopsy-confirmed lesions. In benign lesions, the two most frequent subtypes together account for 49.7% of the lesions, with the remaining 13 subtypes comprising 50.3%. In malignant lesions, the most frequent subtype accounts for 81.8% of the lesions, while the remaining 15 subtypes comprise only 18.2%.
  • Figure 2: Overview of TAILOR and our study design.a, TAILOR pipeline vs. conventional pipeline. TAILOR utilizes knowledge-driven AI-generated data for accurate and interpretable diagnoses. b, Study design. The number of lesions in the training set, the internal test set, the external test set, and the DCIS test set. The design of reader study. The involved four institutions are introduced in \ref{['subsec:data_process']}. c, AI-assisted clinical diagnosis. We compared TAILOR-Diag with radiologists. We investigate the effectiveness of the TAILOR-Diag's assistance to enhance radiologists' diagnostic performance.
  • Figure 3: Visualization of real and synthetic breast-US data. Real and synthetic lesions for the pathology classification tasks. a, Images with common benign and malignant lesions. b, Benign and malignant lesions with NCM. c, Benign and malignant lesions with CAL. d, Benign and DCIS lesions. The large images are collected real data, and the smaller images are synthetic data produced by TAILOR-Gen. To demonstrate the realism of the lesion and background areas in the generated images, we provide the whole-slide synthetic images in a. To demonstrate the representative US features of each tail category, we provide the lesion areas of the generated images in b, c, and d.
  • Figure 4: Interpretable diagnostic model. We provide two examples of TAILOR-Diag's decision-making processes (for a, a benign lesion with NCM and CAL, b, a malignant DCIS lesion). For each input image, it passes through the general model and automatically selects expert models based on confidence scores. Then, we combine the predictions of both the general model and the selected expert model(s) to obtain the final prediction.
  • Figure 5: Comparison of the real-data-trained baseline and TAILOR-Diag. We show the receiver operating characteristic (ROC) curves on a, the internal test set, b, the external test set, and c, the public BUSI test set. We show the ROCs of the pathology classification task on d, DCIS and benign lesions, e, lesions with NCM and f, lesions with CAL. We provide the number of correct predictions for each pathological subtype on g, the internal test set, and h, the external test set. The results of the real-data-trained baseline and TAILOR-Diag are both calculated with a fixed sensitivity of 98%.
  • ...and 1 more figures