Sparse Autoencoders for Interpretable Medical Image Representation Learning

Philipp Wesp; Robbie Holland; Vasiliki Sideri-Lampretsa; Sergios Gatidis

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Philipp Wesp, Robbie Holland, Vasiliki Sideri-Lampretsa, Sergios Gatidis

Abstract

Vision foundation models (FMs) achieve state-of-the-art performance in medical imaging. However, they encode information in abstract latent representations that clinicians cannot interrogate or verify. The goal of this study is to investigate Sparse Autoencoders (SAEs) for replacing opaque FM image representations with human-interpretable, sparse features. We train SAEs on embeddings from BiomedParse (biomedical) and DINOv3 (general-purpose) using 909,873 CT and MRI 2D image slices from the TotalSegmentator dataset. We find that learned sparse features: (a) reconstruct original embeddings with high fidelity (R2 up to 0.941) and recover up to 87.8% of downstream performance using only 10 features (99.4% dimensionality reduction), (b) preserve semantic fidelity in image retrieval tasks, (c) correspond to specific concepts that can be expressed in language using large language model (LLM)-based auto-interpretation. (d) bridge clinical language and abstract latent representations in zero-shot language-driven image retrieval. Our work indicates SAEs are a promising pathway towards interpretable, concept-driven medical vision systems. Code repository: https://github.com/pwesp/sail.

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Abstract

Paper Structure (21 sections, 4 figures, 2 tables)

This paper contains 21 sections, 4 figures, 2 tables.

Introduction
Methods
Sparse Autoencoder
Monosemanticity scoring.
Interpretability Evaluation
Sparse Fingerprint Retrieval.
Automated Feature Interpretation.
Language-Driven Image Retrieval.
Experiments & Results
SAE Quality
Latent space reconstruction (R$^2$).
Downstream performance (ROC-AUC).
SAE Configuration Ranking
Monosemanticity & performance recovery.
Configuration ranking.
...and 6 more sections

Figures (4)

Figure 1: (A) A Sparse Autoencoder replaces opaque dense FM embeddings with a sparse feature space. (B) Sparse fingerprint retrieval matches images by cosine similarity over $k$ top-activated features. (C) A VLM generates a concept description for each feature from its top-activating images and metadata. (D) An LLM maps a clinical text query to matching feature concepts for zero-shot image retrieval.
Figure 2: SAE quality and performance recovery across 96 configurations per FM (DINOv3: blue, BiomedParse: orange, random baseline: grey). (A--D) Reconstruction fidelity (R$^2$), downstream ROC-AUC, alive features, and monosemanticity score vs. L0 sparsity. (E--G) Performance recovery using only the top-$N$ features ($N=1,3,10,50$).
Figure 3: Sparse fingerprint retrieval at $k=5$ for five reference cases (A--E) spanning CT and MRI across multiple anatomical regions. Row 1: reference images with BiomedParse (orange) and DINOv3 (blue) fingerprint insets. Rows 2--3: top-2 BiomedParse retrievals. Rows 4--5: top-2 DINOv3 retrievals.
Figure 4: Zero-shot language-driven retrieval for "Axial CT of the abdomen and retroperitoneum in an elderly patient." An LLM selects matching feature concepts (left), determining a sparse fingerprint (center) for cosine retrieval (right). BiomedParse selects mixed MRI/CT concepts and retrieves thoracic images. DINOv3 selects CT-specific abdomen features and retrieves correct axial abdominal CT.

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Abstract

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Authors

Abstract

Table of Contents

Figures (4)