Table of Contents
Fetching ...

Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis

Mingyuan Liu, Lu Xu, Shengnan Liu, Jicong Zhang

TL;DR

This work addresses the data- and compute-constrained transfer of large vision models to medical diagnosis by proposing SH-PEFT, a sparsity- and hybridity-inspired parameter-efficient fine-tuning method. SH-PEFT selects a small subset of weights to train by computing a hybrid importance score that combines task-specific impact and task-agnostic significance, using either L1 or L2 based estimators and a thresholding mask. Empirical results on six medical datasets across modalities show SH-PEFT achieves state-of-the-art PEFT performance and can match or exceed domain-specific diagnostic models while tuning only a small fraction of parameters, highlighting the strong potential of large-model transfer for medical diagnosis. The findings emphasize that a carefully designed, hybrid-aware weight selection strategy can unlock effective knowledge transfer with minimal training overhead, offering practical benefits for clinical applications.

Abstract

The success of Large Vision Models (LVMs) is accompanied by vast data volumes, which are prohibitively expensive in medical diagnosis.To address this, recent efforts exploit Parameter-Efficient Fine-Tuning (PEFT), which trains a small number of weights while freezing the rest.However, they typically assign trainable weights to the same positions in LVMs in a heuristic manner, regardless of task differences, making them suboptimal for professional applications like medical diagnosis.To address this, we statistically reveal the nature of sparsity and hybridity during diagnostic-targeted fine-tuning, i.e., a small portion of key weights significantly impacts performance, and these key weights are hybrid, including both task-specific and task-agnostic parts.Based on this, we propose a novel Sparsity- and Hybridity-inspired Parameter Efficient Fine-Tuning (SH-PEFT).It selects and trains a small portion of weights based on their importance, which is innovatively estimated by hybridizing both task-specific and task-agnostic strategies.Validated on six medical datasets of different modalities, we demonstrate that SH-PEFT achieves state-of-the-art performance in transferring LVMs to medical diagnosis in terms of accuracy. By tuning around 0.01% number of weights, it outperforms full model fine-tuning.Moreover, SH-PEFT also achieves comparable performance to other models deliberately optimized for specific medical tasks.Extensive experiments demonstrate the effectiveness of each design and reveal that large model transfer holds great potential in medical diagnosis.

Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis

TL;DR

This work addresses the data- and compute-constrained transfer of large vision models to medical diagnosis by proposing SH-PEFT, a sparsity- and hybridity-inspired parameter-efficient fine-tuning method. SH-PEFT selects a small subset of weights to train by computing a hybrid importance score that combines task-specific impact and task-agnostic significance, using either L1 or L2 based estimators and a thresholding mask. Empirical results on six medical datasets across modalities show SH-PEFT achieves state-of-the-art PEFT performance and can match or exceed domain-specific diagnostic models while tuning only a small fraction of parameters, highlighting the strong potential of large-model transfer for medical diagnosis. The findings emphasize that a carefully designed, hybrid-aware weight selection strategy can unlock effective knowledge transfer with minimal training overhead, offering practical benefits for clinical applications.

Abstract

The success of Large Vision Models (LVMs) is accompanied by vast data volumes, which are prohibitively expensive in medical diagnosis.To address this, recent efforts exploit Parameter-Efficient Fine-Tuning (PEFT), which trains a small number of weights while freezing the rest.However, they typically assign trainable weights to the same positions in LVMs in a heuristic manner, regardless of task differences, making them suboptimal for professional applications like medical diagnosis.To address this, we statistically reveal the nature of sparsity and hybridity during diagnostic-targeted fine-tuning, i.e., a small portion of key weights significantly impacts performance, and these key weights are hybrid, including both task-specific and task-agnostic parts.Based on this, we propose a novel Sparsity- and Hybridity-inspired Parameter Efficient Fine-Tuning (SH-PEFT).It selects and trains a small portion of weights based on their importance, which is innovatively estimated by hybridizing both task-specific and task-agnostic strategies.Validated on six medical datasets of different modalities, we demonstrate that SH-PEFT achieves state-of-the-art performance in transferring LVMs to medical diagnosis in terms of accuracy. By tuning around 0.01% number of weights, it outperforms full model fine-tuning.Moreover, SH-PEFT also achieves comparable performance to other models deliberately optimized for specific medical tasks.Extensive experiments demonstrate the effectiveness of each design and reveal that large model transfer holds great potential in medical diagnosis.
Paper Structure (9 sections, 2 equations, 4 figures, 5 tables)

This paper contains 9 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: During PEFT, instead of heuristically assigning trainable weights to fixed positions across various medical diagnostic tasks, we employ a data-driven approach to select key weights for each task, enabling more effective fine-tuning.
  • Figure 2: We experimentally conclude the sparsity and hybridity nature of key weight distributions from six medical datasets, by comparing differences between pre-trained and medical fine-tuned CLIP model. Sparsity indicates that a few key weights largely impact performance, thereby motivating us to select important weights for tuning. Hybridity indicates that key weights contain both task-specific and task-agnostic parts, thereby prompting us to explore a hybrid strategy to locate key weights for more effective PEFT.
  • Figure 3: Inspired by the sparsity and hybridity, we propose a novel SH-PEFT approach to fine-tune a few key weights for adapting pre-trained vision transformers to medical diagnosis. The key weights can be effectively and quickly identified by jointly considering their importance from both task-specific and task-agnostic perspectives.
  • Figure 4: Under the same ratio of trainable weights, SH-PEFT outperforms state-of-the-art PEFT methods in terms of the average $F_1$-value across six datasets.