Table of Contents
Fetching ...

Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting

Chantal Pellegrini, Adrian Delchev, Ege Özsoy, Nassir Navab, Matthias Keicher

TL;DR

ProtoSR, an approach for injecting free-text information into structured report population, achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.

Abstract

Structured radiology reporting promises faster, more consistent communication than free text, but automation remains difficult as models must make many fine-grained, discrete decisions about rare findings and attributes from limited structured supervision. In contrast, free-text reports are produced at scale in routine care and implicitly encode fine-grained, image-linked information through detailed descriptions. To leverage this unstructured knowledge, we propose ProtoSR, an approach for injecting free-text information into structured report population. First, we introduce an automatic extraction pipeline that uses an instruction-tuned LLM to mine 80k+ MIMIC-CXR studies and build a multimodal knowledge base aligned with a structured reporting template, representing each answer option with a visual prototype. Using this knowledge base, ProtoSR is trained to retrieve prototypes relevant for the current image-question pair and augment the model predictions through a prototype-conditioned residual, providing a data-driven second opinion that selectively corrects predictions. On the Rad-ReStruct benchmark, ProtoSR achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.

Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting

TL;DR

ProtoSR, an approach for injecting free-text information into structured report population, achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.

Abstract

Structured radiology reporting promises faster, more consistent communication than free text, but automation remains difficult as models must make many fine-grained, discrete decisions about rare findings and attributes from limited structured supervision. In contrast, free-text reports are produced at scale in routine care and implicitly encode fine-grained, image-linked information through detailed descriptions. To leverage this unstructured knowledge, we propose ProtoSR, an approach for injecting free-text information into structured report population. First, we introduce an automatic extraction pipeline that uses an instruction-tuned LLM to mine 80k+ MIMIC-CXR studies and build a multimodal knowledge base aligned with a structured reporting template, representing each answer option with a visual prototype. Using this knowledge base, ProtoSR is trained to retrieve prototypes relevant for the current image-question pair and augment the model predictions through a prototype-conditioned residual, providing a data-driven second opinion that selectively corrects predictions. On the Rad-ReStruct benchmark, ProtoSR achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.
Paper Structure (9 sections, 5 equations, 2 figures, 4 tables)

This paper contains 9 sections, 5 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of our architecture. The hierarchical SR base model produces base logits, while the prototype-conditioned knowledge branch retrieves label-aligned examples from a prototype bank and converts them into a scaled static residual logit correction. The final prediction is made from the fused logits.
  • Figure 2: Knowledge base extraction. Dataset A defines the target structured reporting template, while Dataset B contains paired images and free-text reports. We first expand the template label vocabulary with alternative phrasings, then identify template-aligned label occurrences in the free-text reports, and finally apply filtering to build a knowledge base that links Dataset B images to the template-aligned labels.