Table of Contents
Fetching ...

An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation

Ahmed Abdulaal, Hugo Fry, Nina Montaña-Brown, Ayodeji Ijishakin, Jack Gao, Stephanie Hyland, Daniel C. Alexander, Daniel C. Castro

TL;DR

SAE-Rad is introduced, which uses sparse autoencoders (SAEs) to decompose latent representations from a pre-trained vision transformer into human-interpretable features, and represents the first instance of using mechanistic interpretability techniques explicitly for a downstream multi-modal reasoning task.

Abstract

Radiological services are experiencing unprecedented demand, leading to increased interest in automating radiology report generation. Existing Vision-Language Models (VLMs) suffer from hallucinations, lack interpretability, and require expensive fine-tuning. We introduce SAE-Rad, which uses sparse autoencoders (SAEs) to decompose latent representations from a pre-trained vision transformer into human-interpretable features. Our hybrid architecture combines state-of-the-art SAE advancements, achieving accurate latent reconstructions while maintaining sparsity. Using an off-the-shelf language model, we distil ground-truth reports into radiological descriptions for each SAE feature, which we then compile into a full report for each image, eliminating the need for fine-tuning large models for this task. To the best of our knowledge, SAE-Rad represents the first instance of using mechanistic interpretability techniques explicitly for a downstream multi-modal reasoning task. On the MIMIC-CXR dataset, SAE-Rad achieves competitive radiology-specific metrics compared to state-of-the-art models while using significantly fewer computational resources for training. Qualitative analysis reveals that SAE-Rad learns meaningful visual concepts and generates reports aligning closely with expert interpretations. Our results suggest that SAEs can enhance multimodal reasoning in healthcare, providing a more interpretable alternative to existing VLMs.

An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation

TL;DR

SAE-Rad is introduced, which uses sparse autoencoders (SAEs) to decompose latent representations from a pre-trained vision transformer into human-interpretable features, and represents the first instance of using mechanistic interpretability techniques explicitly for a downstream multi-modal reasoning task.

Abstract

Radiological services are experiencing unprecedented demand, leading to increased interest in automating radiology report generation. Existing Vision-Language Models (VLMs) suffer from hallucinations, lack interpretability, and require expensive fine-tuning. We introduce SAE-Rad, which uses sparse autoencoders (SAEs) to decompose latent representations from a pre-trained vision transformer into human-interpretable features. Our hybrid architecture combines state-of-the-art SAE advancements, achieving accurate latent reconstructions while maintaining sparsity. Using an off-the-shelf language model, we distil ground-truth reports into radiological descriptions for each SAE feature, which we then compile into a full report for each image, eliminating the need for fine-tuning large models for this task. To the best of our knowledge, SAE-Rad represents the first instance of using mechanistic interpretability techniques explicitly for a downstream multi-modal reasoning task. On the MIMIC-CXR dataset, SAE-Rad achieves competitive radiology-specific metrics compared to state-of-the-art models while using significantly fewer computational resources for training. Qualitative analysis reveals that SAE-Rad learns meaningful visual concepts and generates reports aligning closely with expert interpretations. Our results suggest that SAEs can enhance multimodal reasoning in healthcare, providing a more interpretable alternative to existing VLMs.
Paper Structure (46 sections, 13 equations, 22 figures, 5 tables)

This paper contains 46 sections, 13 equations, 22 figures, 5 tables.

Figures (22)

  • Figure 1: SAE-Rad overview. Panel A: We learn a set of sparsely activating features by training a Sparse Autoencoder (SAE) on class tokens produced by a radiology-image encoder. Panel B: We retrieve the corresponding reference reports for highest activating images for a feature, from which we can produce text descriptions of each feature. Panel C: We pass a new image through the radiology-image encoder and SAE encoder to retrieve the highest activating features. Text descriptions of these features are subsequently used by a pretrained large language model (LLM) to generate a detailed radiology report.
  • Figure 2: SAE-Rad identifies clinically relevant and interpretable features within radiological images. We illustrate a number of pathological and instrumentation features relevant for producing radiology reports. We add annotations (green arrows) to emphasize the presence of each feature.
  • Figure 2: RadFact performance metrics for different SAE-Rad configurations. /w inds = with indication(s), /w inds + prev.reps = with indications and previous text reports.
  • Figure 3: SAE-Rad accurately captures features reported by human radiologists and more. Above, we showcase a side-by-side comparison between a ground-truth radiology report and one generated by SAE-Rad. The model successfully identifies key clinically relevant features. SAE-Rad also identifies additional details, such as a right-sided dialysis catheter, without hallucination (we annotate this feature with green arrows for emphasis). SAE-Rad can also miss features when compared to the reference report.
  • Figure 4: SAE-Rad enables counterfactual image generation and unsupervised segmentation with disentangled class tokens. Row 1 examines a pacemaker, and Row 2 investigates cardiomegaly. Column 1 shows original MIMIC-CXR images, Column 2 shows model reconstructions, and Columns 3 and 4 depict counterfactuals by adding and removing features. The final column demonstrates unsupervised segmentation by comparing counterfactual and original images. Details are in \ref{['sec:appendix_d']}.
  • ...and 17 more figures