Table of Contents
Fetching ...

brat: Aligned Multi-View Embeddings for Brain MRI Analysis

Maxime Kayser, Maksim Gridnev, Wanting Wang, Max Bain, Aneesh Rangnekar, Avijit Chatterjee, Aleksandr Petrov, Harini Veeraraghavan, Nathaniel C. Swinburne

TL;DR

Brat introduces aligned multi-view embeddings for brain MRI analysis by pairing a 3D vision backbone with learnable query tokens and sentence-level clinical features. Pairwise View Alignment (PVA) and a Determinantal Point Process (DPP) based quality-diversity loss are used to produce diverse, clinically meaningful image representations aligned to report content. Trained on the large MSKBrain dataset (~80k MRIs with reports), brat improves image-text retrieval and enhances downstream tasks such as report generation, Alzheimer’s classification, and metastases segmentation, with weights released for public use. The framework generalizes to other modalities (e.g., BIMCV-R) and is architecture-agnostic, offering a scalable path for 3D medical vision-language pre-training with real-world clinical impact.

Abstract

We present brat (brain report alignment transformer), a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports. Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume. To address these challenges, we introduce a brain MRI dataset $10\times$ larger than existing ones, containing approximately 80,000 3D scans with corresponding radiology reports, and propose a multi-view pre-training approach inspired by advances in document retrieval. We develop an implicit query-feature matching mechanism and adopt concepts from quality-diversity to obtain multi-view embeddings of MRIs that are aligned with the clinical features given by report sentences. We evaluate our approach across multiple vision-language and vision tasks, demonstrating substantial performance improvements. The brat foundation models are publicly released.

brat: Aligned Multi-View Embeddings for Brain MRI Analysis

TL;DR

Brat introduces aligned multi-view embeddings for brain MRI analysis by pairing a 3D vision backbone with learnable query tokens and sentence-level clinical features. Pairwise View Alignment (PVA) and a Determinantal Point Process (DPP) based quality-diversity loss are used to produce diverse, clinically meaningful image representations aligned to report content. Trained on the large MSKBrain dataset (~80k MRIs with reports), brat improves image-text retrieval and enhances downstream tasks such as report generation, Alzheimer’s classification, and metastases segmentation, with weights released for public use. The framework generalizes to other modalities (e.g., BIMCV-R) and is architecture-agnostic, offering a scalable path for 3D medical vision-language pre-training with real-world clinical impact.

Abstract

We present brat (brain report alignment transformer), a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports. Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume. To address these challenges, we introduce a brain MRI dataset larger than existing ones, containing approximately 80,000 3D scans with corresponding radiology reports, and propose a multi-view pre-training approach inspired by advances in document retrieval. We develop an implicit query-feature matching mechanism and adopt concepts from quality-diversity to obtain multi-view embeddings of MRIs that are aligned with the clinical features given by report sentences. We evaluate our approach across multiple vision-language and vision tasks, demonstrating substantial performance improvements. The brat foundation models are publicly released.

Paper Structure

This paper contains 30 sections, 9 equations, 20 figures, 9 tables, 1 algorithm.

Figures (20)

  • Figure 1: (Left) Brain MRI reports contain rich and diverse information relating to different features and regions of the brain. Report sentences are associated with visual features via their colours. The report was cut off ([...]) to only contain findings visible on this 2D slice. (Right) By drawing parallels to multi-vector retrieval zhang2022multi, we align multi-view embeddings of the MRI with clinical features given in the reports. Multi-view embeddings can attend across the volume, reflecting that clinical features may correspond to more than one spatial region of the scan.
  • Figure 2: Our brat framework. Our Pairwise View Alignment (PVA) algorithm (described in Section \ref{['sec:pva']}) and quality-diversity via Determinental Point Processes (DPPs) (described in Section \ref{['sec:dpp']}) lead to clinically aligned multi-view embeddings of the MRI.
  • Figure 3: Conventional query tokens collapse into a single representation as training progresses. The multi-view embeddings of brat, on the other hand, are diverse and spread out. The plot was obtained by multi-dimensional scaling of 32 query tokens to 2D based on their mean pairwise distances from 32 images.
  • Figure 4: Juxtaposition of 8 query tokens from Q-Former (upper row) and the same 8 tokens from brat (lower row). The collapsed Q-Former queries all attend to the same image regions, whereas the multi-view embeddings of brat focus on distinct features.
  • Figure 5: We connect two configurations of brat with various decoders to evaluate our pre-training on downstream tasks.
  • ...and 15 more figures