A Dual-View Approach to Classifying Radiology Reports by Co-Training
Yutong Han, Yan Yuan, Lili Mou
TL;DR
This work tackles the challenge of radiology report classification by leveraging the dual structure of reports, Treating Findings and Impression as separate views. It introduces a co-training framework where two DistilBERT-based classifiers are trained on the Findings and Impression views, respectively, and progressively share pseudo-labels using unlabeled data to improve performance. The method, validated on brain tumor surveillance tasks with 868 labeled and 10K unlabeled reports, shows that dual-view co-training and ensemble inference outperform supervised, self-training, and naïve baselines, with notable gains on Brain Tumor and Aggressiveness classification. The approach demonstrates the value of internal report structure for semi-supervised learning in medical NLP and has potential to enhance public health surveillance from radiology text.
Abstract
Radiology report analysis provides valuable information that can aid with public health initiatives, and has been attracting increasing attention from the research community. In this work, we present a novel insight that the structure of a radiology report (namely, the Findings and Impression sections) offers different views of a radiology scan. Based on this intuition, we further propose a co-training approach, where two machine learning models are built upon the Findings and Impression sections, respectively, and use each other's information to boost performance with massive unlabeled data in a semi-supervised manner. We conducted experiments in a public health surveillance study, and results show that our co-training approach is able to improve performance using the dual views and surpass competing supervised and semi-supervised methods.
