Table of Contents
Fetching ...

Specializing Foundation Models via Mixture of Low-Rank Experts for Comprehensive Head CT Analysis

Youngjin Yoo, Han Liu, Bogdan Georgescu, Yanbo Zhang, Sasa Grbic, Michael Baumgartner, Thomas J. Re, Jyotipriya Das, Poikavila Ullaskrishnan, Eva Eibenberger, Andrei Chekkoury, Uttam K. Bodanapally, Savvas Nicolaou, Pina C. Sanelli, Thomas J. Schroeppel, Yvonne W. Lui, Eli Gibson

TL;DR

A Mixture of Low-Rank Experts (MoLRE) framework is proposed that extends LoRA with multiple specialized low-rank adapters and unsupervised soft routing, which enables conditional feature adaptation with less than 0.5% additional parameters and without explicit pathology supervision.

Abstract

Foundation models pre-trained on large-scale datasets demonstrate strong transfer learning capabilities; however, their adaptation to complex multi-label diagnostic tasks-such as comprehensive head CT finding detection-remains understudied. Standard parameter-efficient fine-tuning methods such as LoRA apply uniform adaptations across pathology types, which may limit performance for diverse medical findings. We propose a Mixture of Low-Rank Experts (MoLRE) framework that extends LoRA with multiple specialized low-rank adapters and unsupervised soft routing. This approach enables conditional feature adaptation with less than 0.5% additional parameters and without explicit pathology supervision. We present a comprehensive benchmark of MoLRE across six state-of-the-art medical imaging foundation models spanning 2D and 3D architectures, general-domain, medical-domain, and head CT-specific pretraining, and model sizes ranging from 7M to 431M parameters. Using over 70,000 non-contrast head CT scans with 75 annotated findings-including hemorrhage, infarction, trauma, mass lesions, structural abnormalities, and chronic changes-our experiments demonstrate consistent performance improvements across all models. Gains vary substantially: general-purpose and medical-domain models show the largest improvements (DINOv3-Base: +4.6%; MedGemma: +4.3%), whereas 3D CT-specialized or very large models show more modest gains (+0.2-1.3%). The combination of MoLRE and MedGemma achieves the highest average detection AUC of 0.917. These findings highlight the importance of systematic benchmarking on target clinical tasks, as pretraining domain, architecture, and model scale interact in non-obvious ways.

Specializing Foundation Models via Mixture of Low-Rank Experts for Comprehensive Head CT Analysis

TL;DR

A Mixture of Low-Rank Experts (MoLRE) framework is proposed that extends LoRA with multiple specialized low-rank adapters and unsupervised soft routing, which enables conditional feature adaptation with less than 0.5% additional parameters and without explicit pathology supervision.

Abstract

Foundation models pre-trained on large-scale datasets demonstrate strong transfer learning capabilities; however, their adaptation to complex multi-label diagnostic tasks-such as comprehensive head CT finding detection-remains understudied. Standard parameter-efficient fine-tuning methods such as LoRA apply uniform adaptations across pathology types, which may limit performance for diverse medical findings. We propose a Mixture of Low-Rank Experts (MoLRE) framework that extends LoRA with multiple specialized low-rank adapters and unsupervised soft routing. This approach enables conditional feature adaptation with less than 0.5% additional parameters and without explicit pathology supervision. We present a comprehensive benchmark of MoLRE across six state-of-the-art medical imaging foundation models spanning 2D and 3D architectures, general-domain, medical-domain, and head CT-specific pretraining, and model sizes ranging from 7M to 431M parameters. Using over 70,000 non-contrast head CT scans with 75 annotated findings-including hemorrhage, infarction, trauma, mass lesions, structural abnormalities, and chronic changes-our experiments demonstrate consistent performance improvements across all models. Gains vary substantially: general-purpose and medical-domain models show the largest improvements (DINOv3-Base: +4.6%; MedGemma: +4.3%), whereas 3D CT-specialized or very large models show more modest gains (+0.2-1.3%). The combination of MoLRE and MedGemma achieves the highest average detection AUC of 0.917. These findings highlight the importance of systematic benchmarking on target clinical tasks, as pretraining domain, architecture, and model scale interact in non-obvious ways.
Paper Structure (10 sections, 2 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 2 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: MoLRE: Parameter-efficient specialization for foundation models. Illustrated for 2D models (e.g., DINOv3), MoLRE employs (1) multiple low-rank expert adapters for feature transformations, (2) an unsupervised soft router that learns to weight experts based on input features, and (3) attention-weighted pooling to aggregate slice-level features into volume-level representations. For 3D models (e.g., Pillar0-HeadCT), MoLRE is applied to spatially-pooled volumetric features, enabling conditional adaptation without explicit pathology supervision.
  • Figure 2: Stratified multi-finding detection performance. Numbers of neurological findings achieving high-confidence performance (AUC $\geq 0.90$) and moderate performance ($0.8 \leq \text{AUC} < 0.9$) for each foundation model, with and without MoLRE.
  • Figure 3: Per-finding detection performance with and without MoLRE. Radar plots comparing baseline models and their MoLRE-enhanced counterparts for left: DINOv3-Base vs. DINOv3-Base+MoLRE and right: MedGemma vs. MedGemma+MoLRE across 75 neurological findings.