Quality Control for Radiology Report Generation Models via Auxiliary Auditing Components
Hermione Warr, Yasin Ibrahim, Daniel R. McGowan, Konstantinos Kamnitsas
TL;DR
The paper tackles semantic inaccuracies in AI-generated radiology reports by introducing a modular auditing framework built around auxiliary auditing components (AC) that elicit disease signals from both images and text. The GenX report generator produces chest X-ray reports, which are audited against image-derived disease labels $C_I$ and text-derived labels $C_T$ under the consistency rule $(C_I=C_T) \land p_{AC}(c=C_I|I) \ge t$, with optional deferral for low confidence. Experiments on MIMIC-CXR show that auditing improves disease-semantic F1 from baseline GenX levels to as high as $\approx$58.4 with $t=0.8$, with per-disease ACs outperforming a single multi-label AC, confirming the value of modular redundancy for reliability. The findings support a practical quality-control pathway for clinical deployment of radiology report generation and suggest generalization to other semantic concepts beyond disease classification.
Abstract
Automation of medical image interpretation could alleviate bottlenecks in diagnostic workflows, and has become of particular interest in recent years due to advancements in natural language processing. Great strides have been made towards automated radiology report generation via AI, yet ensuring clinical accuracy in generated reports is a significant challenge, hindering deployment of such methods in clinical practice. In this work we propose a quality control framework for assessing the reliability of AI-generated radiology reports with respect to semantics of diagnostic importance using modular auxiliary auditing components (AC). Evaluating our pipeline on the MIMIC-CXR dataset, our findings show that incorporating ACs in the form of disease-classifiers can enable auditing that identifies more reliable reports, resulting in higher F1 scores compared to unfiltered generated reports. Additionally, leveraging the confidence of the AC labels further improves the audit's effectiveness.
