ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation
Wenjun Hou, Yi Cheng, Kaishuai Xu, Yan Hu, Wenjie Li, Jiang Liu
TL;DR
The paper tackles inter-report consistency in radiology report generation by introducing ICon, a lesion-aware, two-stage framework that first extracts lesions (Zoomer) and then generates consistent reports through lesion-attribute alignment (Inspector) and a cross-attentive generator (FiD/BART). A lesion-aware mixup augments training to align semantically equivalent lesions, while two metrics, Con and R-Con, quantify inter-report consistency with reference-quality weighting. Extensive experiments on IU X-ray, MIMIC-CXR, and MIMIC-ABN demonstrate that ICon achieves state-of-the-art inter-report consistency and competitive clinical accuracy, highlighting the value of region-level lesion reasoning for trustworthy radiology narration. The work suggests practical impact in improving credibility and robustness of automated radiology reporting, with future directions including incorporating large language models and end-to-end optimization ideas to further enhance performance.
Abstract
Previous research on radiology report generation has made significant progress in terms of increasing the clinical accuracy of generated reports. In this paper, we emphasize another crucial quality that it should possess, i.e., inter-report consistency, which refers to the capability of generating consistent reports for semantically equivalent radiographs. This quality is even of greater significance than the overall report accuracy in terms of ensuring the system's credibility, as a system prone to providing conflicting results would severely erode users' trust. Regrettably, existing approaches struggle to maintain inter-report consistency, exhibiting biases towards common patterns and susceptibility to lesion variants. To address this issue, we propose ICON, which improves the inter-report consistency of radiology report generation. Aiming to enhance the system's ability to capture similarities in semantically equivalent lesions, our approach first involves extracting lesions from input images and examining their characteristics. Then, we introduce a lesion-aware mixup technique to ensure that the representations of the semantically equivalent lesions align with the same attributes, achieved through a linear combination during the training phase. Extensive experiments on three publicly available chest X-ray datasets verify the effectiveness of our approach, both in terms of improving the consistency and accuracy of the generated reports.
