Table of Contents
Fetching ...

Low-Rank Knowledge Decomposition for Medical Foundation Models

Yuhang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang

TL;DR

The paper tackles the tension between generality and specialization in medical foundation models and the associated deployment costs. It introduces Low-Rank Knowledge Decomposition (LoRKD), which decomposes a medical foundation model F_p into a shared backbone F_s and T task-specific experts using low-rank factors with g_t = (W_0 + B_t A_t) h_t, trained via an efficient gradient separation mechanism. A task-knowledge switch and parameter fusion enable deploying multiple lightweight experts while keeping a fixed backbone size, with a KL-based transfer loss guiding knowledge transfer from the foundation model. Across RadImagenet, MedMnist, Med-MT, and seven downstream datasets, LoRKD achieves superior performance and transferability with substantially fewer parameters than prior work (KF) and without requiring dual networks, illustrating a practical path to scalable, specialized medical foundation models.

Abstract

The popularity of large-scale pre-training has promoted the development of medical foundation models. However, some studies have shown that although foundation models exhibit strong general feature extraction capabilities, their performance on specific tasks is still inferior to task-specific methods. In this paper, we explore a new perspective called ``Knowledge Decomposition'' to improve the performance on specific medical tasks, which deconstruct the foundation model into multiple lightweight expert models, each dedicated to a particular task, with the goal of improving specialization while concurrently mitigating resource expenditure. To accomplish the above objective, we design a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates graidents by incorporating low-rank expert modules and the efficient knowledge separation convolution. Extensive experimental results demonstrate that the decomposed models perform well in terms of performance and transferability, even surpassing the original foundation models.

Low-Rank Knowledge Decomposition for Medical Foundation Models

TL;DR

The paper tackles the tension between generality and specialization in medical foundation models and the associated deployment costs. It introduces Low-Rank Knowledge Decomposition (LoRKD), which decomposes a medical foundation model F_p into a shared backbone F_s and T task-specific experts using low-rank factors with g_t = (W_0 + B_t A_t) h_t, trained via an efficient gradient separation mechanism. A task-knowledge switch and parameter fusion enable deploying multiple lightweight experts while keeping a fixed backbone size, with a KL-based transfer loss guiding knowledge transfer from the foundation model. Across RadImagenet, MedMnist, Med-MT, and seven downstream datasets, LoRKD achieves superior performance and transferability with substantially fewer parameters than prior work (KF) and without requiring dual networks, illustrating a practical path to scalable, specialized medical foundation models.

Abstract

The popularity of large-scale pre-training has promoted the development of medical foundation models. However, some studies have shown that although foundation models exhibit strong general feature extraction capabilities, their performance on specific tasks is still inferior to task-specific methods. In this paper, we explore a new perspective called ``Knowledge Decomposition'' to improve the performance on specific medical tasks, which deconstruct the foundation model into multiple lightweight expert models, each dedicated to a particular task, with the goal of improving specialization while concurrently mitigating resource expenditure. To accomplish the above objective, we design a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates graidents by incorporating low-rank expert modules and the efficient knowledge separation convolution. Extensive experimental results demonstrate that the decomposed models perform well in terms of performance and transferability, even surpassing the original foundation models.
Paper Structure (13 sections, 7 equations, 4 figures, 5 tables)

This paper contains 13 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Knowledge decomposition is employed to break down the foundation model into multiple lightweight expert models, each dedicated to a specific domain. The goal of this paradigm is to improve the specialization of deployment models within a specific domain, while simultaneously reducing deployment costs.
  • Figure 2: The overview of LoRKD. We introduce low-rank expert modules to control the number of parameters and efficient knowledge separation convolution to achieve computationally efficient explicit gradient separation. The decomposed models can replace the original foundation model in specific domains, and can switch task knowledge conveniently between different departments.
  • Figure 3: Comparison of Grad-CAM visualizations between the decomposed model and the foundation model on DET10. The foundation model tends to focus on larger regions, corresponding to its general feature extraction capability, while our decomposed expert model focuses on more precise regions, reflecting stronger specialization.
  • Figure 4: The comparison of MIG scores on different methods.