Table of Contents
Fetching ...

Neuradicon: operational representation learning of neuroimaging reports

Henry Watkins, Robert Gray, Adam Julius, Yee-Haur Mah, Walter H. L. Pinaya, Paul Wright, Ashwani Jha, Holger Engleitner, Jorge Cardoso, Sebastien Ourselin, Geraint Rees, Rolf Jaeger, Parashkev Nachev

TL;DR

Neuradicon introduces a hybrid NLP framework for converting unstructured neuroradiology reports into quantitative, operational signals. By combining rule-based and machine-learning components, it performs report and section classification, named entity recognition, negation and relation extraction, and then learns a 2D latent representation to enable data-driven radiology phenotyping and spatial inference via GeoSPM. The approach demonstrates strong cross-site generalizability across 336k reports, with high F1 performance (e.g., $F1_{Report}=0.96$, $F1_{Section}=0.93$) and interpretable latent structures that align with ischemic, hemorrhagic, inflammatory, and neoplastic domains. Spatial and phenotypic analyses reveal meaningful associations with age, treatment, and imaging modalities, offering a scalable blueprint for operational optimization and potential multimodal extensions. The work provides a practical pathway to extract rich, actionable signals from unstructured clinical text, supporting targeted workflow improvements and data-driven radiology practice.

Abstract

Radiological reports typically summarize the content and interpretation of imaging studies in unstructured form that precludes quantitative analysis. This limits the monitoring of radiological services to throughput undifferentiated by content, impeding specific, targeted operational optimization. Here we present Neuradicon, a natural language processing (NLP) framework for quantitative analysis of neuroradiological reports. Our framework is a hybrid of rule-based and artificial intelligence models to represent neurological reports in succinct, quantitative form optimally suited to operational guidance. We demonstrate the application of Neuradicon to operational phenotyping of a corpus of 336,569 reports, and report excellent generalizability across time and two independent healthcare institutions.

Neuradicon: operational representation learning of neuroimaging reports

TL;DR

Neuradicon introduces a hybrid NLP framework for converting unstructured neuroradiology reports into quantitative, operational signals. By combining rule-based and machine-learning components, it performs report and section classification, named entity recognition, negation and relation extraction, and then learns a 2D latent representation to enable data-driven radiology phenotyping and spatial inference via GeoSPM. The approach demonstrates strong cross-site generalizability across 336k reports, with high F1 performance (e.g., , ) and interpretable latent structures that align with ischemic, hemorrhagic, inflammatory, and neoplastic domains. Spatial and phenotypic analyses reveal meaningful associations with age, treatment, and imaging modalities, offering a scalable blueprint for operational optimization and potential multimodal extensions. The work provides a practical pathway to extract rich, actionable signals from unstructured clinical text, supporting targeted workflow improvements and data-driven radiology practice.

Abstract

Radiological reports typically summarize the content and interpretation of imaging studies in unstructured form that precludes quantitative analysis. This limits the monitoring of radiological services to throughput undifferentiated by content, impeding specific, targeted operational optimization. Here we present Neuradicon, a natural language processing (NLP) framework for quantitative analysis of neuroradiological reports. Our framework is a hybrid of rule-based and artificial intelligence models to represent neurological reports in succinct, quantitative form optimally suited to operational guidance. We demonstrate the application of Neuradicon to operational phenotyping of a corpus of 336,569 reports, and report excellent generalizability across time and two independent healthcare institutions.

Paper Structure

This paper contains 37 sections, 2 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: A diagram of the NLP pipeline. The models, represented by rectangles, are a mixture of machine-learning and rules-based methods. The output data (romboids) from each model are passed to another in the pipeline.
  • Figure 2: The schema for the report section classification and segmentation model. The first token of the report section is classified into one of 5 classes. This acts as the section 'anchor' token, and all subsequent tokens belong to the same section. Once all tokens have been classified to a section, we can segment the report into meaningful mutually-exclusive sections.
  • Figure 3: This example of a radiological report shows words tagged with named entities. This image shows the broad classes of DESCRIPTOR, LOCATION and PATHOLOGY. The NER model classifies tokens according to the BIOS format.
  • Figure 4: A diagram of the report embedding model and procedure. Text reports are featurized by the NLP pipeline. entities of interest are extracted and these turned into binary vectors ore being embedded into a 2d latent space by an auto-encoder.
  • Figure 5: The 2D latent representation of asserted pathological terms, labelled by patient age. Reports are projected onto a 2d latent space using an auto-encoder, based on their pathology textual content. Note variation in age-related patterns across the space
  • ...and 7 more figures