Neuradicon: operational representation learning of neuroimaging reports
Henry Watkins, Robert Gray, Adam Julius, Yee-Haur Mah, Walter H. L. Pinaya, Paul Wright, Ashwani Jha, Holger Engleitner, Jorge Cardoso, Sebastien Ourselin, Geraint Rees, Rolf Jaeger, Parashkev Nachev
TL;DR
Neuradicon introduces a hybrid NLP framework for converting unstructured neuroradiology reports into quantitative, operational signals. By combining rule-based and machine-learning components, it performs report and section classification, named entity recognition, negation and relation extraction, and then learns a 2D latent representation to enable data-driven radiology phenotyping and spatial inference via GeoSPM. The approach demonstrates strong cross-site generalizability across 336k reports, with high F1 performance (e.g., $F1_{Report}=0.96$, $F1_{Section}=0.93$) and interpretable latent structures that align with ischemic, hemorrhagic, inflammatory, and neoplastic domains. Spatial and phenotypic analyses reveal meaningful associations with age, treatment, and imaging modalities, offering a scalable blueprint for operational optimization and potential multimodal extensions. The work provides a practical pathway to extract rich, actionable signals from unstructured clinical text, supporting targeted workflow improvements and data-driven radiology practice.
Abstract
Radiological reports typically summarize the content and interpretation of imaging studies in unstructured form that precludes quantitative analysis. This limits the monitoring of radiological services to throughput undifferentiated by content, impeding specific, targeted operational optimization. Here we present Neuradicon, a natural language processing (NLP) framework for quantitative analysis of neuroradiological reports. Our framework is a hybrid of rule-based and artificial intelligence models to represent neurological reports in succinct, quantitative form optimally suited to operational guidance. We demonstrate the application of Neuradicon to operational phenotyping of a corpus of 336,569 reports, and report excellent generalizability across time and two independent healthcare institutions.
