Table of Contents
Fetching ...

LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation

Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman

TL;DR

LaB-RAG introduces a label-boosted retrieval augmented generation framework for radiology report generation that avoids fine-tuning large models. It derives radiology-specific textual labels from zero-shot image embeddings using lightweight LaB-Classifiers, then uses these labels to filter and format retrieved exemplars before prompting a general-domain LLM via in-context learning. Across MIMIC-CXR and CheXpert Plus, LaB-RAG achieves state-of-the-art F1-CheXbert findings and competitive RadGraph performance, with ablations showing additive gains from label filtering and prompt formatting. The approach demonstrates that modular, low-cost components can meaningfully boost radiology report generation and can synergize with existing fine-tuning methods for further gains.

Abstract

In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that fine-tuning of large, bespoke models is required to improve model generation accuracy. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a small-model-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG) over MIMIC-CXR and CheXpert Plus. We argue that simple classification models combined with zero-shot embeddings can effectively transform X-rays into text-space as radiology-specific labels. In combination with standard RAG, we show that these derived text labels can be used with general-domain LLMs to generate radiology reports. Without ever training our generative language model or image embedding models specifically for the task, and without ever directly "showing" the LLM an X-ray, we demonstrate that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods, while attaining competitive results compared to other fine-tuned vision-language RRG models. We further conduct extensive ablation experiments to better understand the components of LaB-RAG. Our results suggest broader compatibility and synergy with fine-tuned methods to further enhance RRG performance.

LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation

TL;DR

LaB-RAG introduces a label-boosted retrieval augmented generation framework for radiology report generation that avoids fine-tuning large models. It derives radiology-specific textual labels from zero-shot image embeddings using lightweight LaB-Classifiers, then uses these labels to filter and format retrieved exemplars before prompting a general-domain LLM via in-context learning. Across MIMIC-CXR and CheXpert Plus, LaB-RAG achieves state-of-the-art F1-CheXbert findings and competitive RadGraph performance, with ablations showing additive gains from label filtering and prompt formatting. The approach demonstrates that modular, low-cost components can meaningfully boost radiology report generation and can synergize with existing fine-tuning methods for further gains.

Abstract

In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that fine-tuning of large, bespoke models is required to improve model generation accuracy. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a small-model-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG) over MIMIC-CXR and CheXpert Plus. We argue that simple classification models combined with zero-shot embeddings can effectively transform X-rays into text-space as radiology-specific labels. In combination with standard RAG, we show that these derived text labels can be used with general-domain LLMs to generate radiology reports. Without ever training our generative language model or image embedding models specifically for the task, and without ever directly "showing" the LLM an X-ray, we demonstrate that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods, while attaining competitive results compared to other fine-tuned vision-language RRG models. We further conduct extensive ablation experiments to better understand the components of LaB-RAG. Our results suggest broader compatibility and synergy with fine-tuned methods to further enhance RRG performance.

Paper Structure

This paper contains 28 sections, 2 equations, 21 figures, 8 tables, 1 algorithm.

Figures (21)

  • Figure 1: Overview of LaB-RAG for RRG compared to standard RAG.
  • Figure 2: LaB-RAG inference for RRG. Symbols correspond to those in Algorithm \ref{['alg:pseudocode']}.
  • Figure 3: Left: LaB-RAG beats other retrieval methods (CXR-RePaiR/ReDonE, X-REM) on RRG metrics. On F1CheXbert, LaB-RAG achieves SOTA on "Findings" generation and performs no different than SFT methods on "Impression" generation (CheXagent, CXRMate). Right: Ablation of individual label boosting components of LaB-RAG. With minimal additional complexity over standard RAG, LaB-RAG has greater gain in F1CheXbert on "Findings" than alternate SFT methods.
  • Figure 4: Left: Domain and dataset specificity of image embeddings significantly improves LaB-RAG generations. Right: Improving labeler quality significantly improves LaB-RAG generations. Extr: Extracted from inference target's ground-truth report, Pred: Predicted from inference image, cxB: CheXbert derived labels, cxP: CheXpert derived labels. For predicted labels, classifiers were trained over labels derived from either the CheXbert or CheXpert labeler.
  • Figure 5: Overview of training per-label LaB-Classifier logistic regressions.
  • ...and 16 more figures