DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Sang-Jun Park, Keun-Soo Heo, Dong-Hee Shin, Young-Han Son, Ji-Hye Oh, Tae-Eui Kam
TL;DR
DART tackles trustworthy radiology report generation by coupling disease-aware image-text alignment with a self-correcting re-alignment mechanism. It first retrieves disease-relevant text using a contrastively learned embedding space and a disease-matching constraint, then generates reports from retrieved content and disease features. A second-stage self-correction module re-aligns the generated report within the embedding space to further reduce omissions and improve clinical fidelity, trained with a dedicated correction loss. The approach achieves state-of-the-art performance on MIMIC-CXR and IU X-ray across descriptive NLG metrics and clinical efficacy evaluations, demonstrating improved trustworthiness and potential to alleviate radiologists’ workload.
Abstract
The automatic generation of radiology reports has emerged as a promising solution to reduce a time-consuming task and accurately capture critical disease-relevant findings in X-ray images. Previous approaches for radiology report generation have shown impressive performance. However, there remains significant potential to improve accuracy by ensuring that retrieved reports contain disease-relevant findings similar to those in the X-ray images and by refining generated reports. In this study, we propose a Disease-aware image-text Alignment and self-correcting Re-alignment for Trustworthy radiology report generation (DART) framework. In the first stage, we generate initial reports based on image-to-text retrieval with disease-matching, embedding both images and texts in a shared embedding space through contrastive learning. This approach ensures the retrieval of reports with similar disease-relevant findings that closely align with the input X-ray images. In the second stage, we further enhance the initial reports by introducing a self-correction module that re-aligns them with the X-ray images. Our proposed framework achieves state-of-the-art results on two widely used benchmarks, surpassing previous approaches in both report generation and clinical efficacy metrics, thereby enhancing the trustworthiness of radiology reports.
