Table of Contents
Fetching ...

Sparse Task Vector Mixup with Hypernetworks for Efficient Knowledge Transfer in Whole-Slide Image Prognosis

Pei Liu, Xiangxiang Zeng, Tengfei Ma, Yucheng Xing, Xuanbai Ren, Yiping Liu

Abstract

Whole-Slide Images (WSIs) are widely used for estimating the prognosis of cancer patients. Current studies generally follow a cancer-specific learning paradigm. However, the available training samples for one cancer type are usually scarce in pathology. Consequently, the model often struggles to learn generalizable knowledge, thus performing worse on the tumor samples with inherent high heterogeneity. Although multi-cancer joint learning and knowledge transfer approaches have been explored recently to address it, they either rely on large-scale joint training or extensive inference across multiple models, posing new challenges in computational efficiency. To this end, this paper proposes a new scheme, Sparse Task Vector Mixup with Hypernetworks (STEPH). Unlike previous ones, it efficiently absorbs generalizable knowledge from other cancers for the target via model merging: i) applying task vector mixup to each source-target pair and then ii) sparsely aggregating task vector mixtures to obtain an improved target model, driven by hypernetworks. Extensive experiments on 13 cancer datasets show that STEPH improves over cancer-specific learning and an existing knowledge transfer baseline by 5.14% and 2.01%, respectively. Moreover, it is a more efficient solution for learning prognostic knowledge from other cancers, without requiring large-scale joint training or extensive multi-model inference. Code is publicly available at https://github.com/liupei101/STEPH.

Sparse Task Vector Mixup with Hypernetworks for Efficient Knowledge Transfer in Whole-Slide Image Prognosis

Abstract

Whole-Slide Images (WSIs) are widely used for estimating the prognosis of cancer patients. Current studies generally follow a cancer-specific learning paradigm. However, the available training samples for one cancer type are usually scarce in pathology. Consequently, the model often struggles to learn generalizable knowledge, thus performing worse on the tumor samples with inherent high heterogeneity. Although multi-cancer joint learning and knowledge transfer approaches have been explored recently to address it, they either rely on large-scale joint training or extensive inference across multiple models, posing new challenges in computational efficiency. To this end, this paper proposes a new scheme, Sparse Task Vector Mixup with Hypernetworks (STEPH). Unlike previous ones, it efficiently absorbs generalizable knowledge from other cancers for the target via model merging: i) applying task vector mixup to each source-target pair and then ii) sparsely aggregating task vector mixtures to obtain an improved target model, driven by hypernetworks. Extensive experiments on 13 cancer datasets show that STEPH improves over cancer-specific learning and an existing knowledge transfer baseline by 5.14% and 2.01%, respectively. Moreover, it is a more efficient solution for learning prognostic knowledge from other cancers, without requiring large-scale joint training or extensive multi-model inference. Code is publicly available at https://github.com/liupei101/STEPH.
Paper Structure (40 sections, 17 equations, 14 figures, 12 tables)

This paper contains 40 sections, 17 equations, 14 figures, 12 tables.

Figures (14)

  • Figure 1: Current learning paradigms of modeling WSIs for survival analysis. Unlike previous approaches, the proposed $\textsc{STEPH}$ efficiently utilizes the generalizable prognostic knowledge from other cancers by model merging.
  • Figure 2: Performance of different learning paradigms in WSI prognosis. We compare $\textsc{STEPH}$ with cancer-specific learning and a representation transfer-based solution. Multi-cancer joint learning is absent as its training overheads and hardware specifications are orders of magnitude greater than those of others.
  • Figure 3: Sparse Task Vector Mixup with Hypernetworks ($\textsc{STEPH}$) for efficient knowledge transfer in WSI prognosis. After computing task vectors, $\textsc{STEPH}$ first applies mixup to each paired $(\tau_t,\tau_s)$ to absorb prognostic knowledge from cross-cancer models. Then, the most beneficial mixtures are selected and aggregated to derive $\mathcal{M}_t^{*}$ for prediction. Hypernetworks drive these steps by steering task vectors.
  • Figure 4: Loss landscape of task vector mixup on $t$. Red vectors in (a) and (b) depict the range of TVM with lower loss. Figure (c) shows the minimum loss that can be reached by TVM.
  • Figure 5: Alignment of $\tau_{\text{mix}}$ with $\tau_t$ in dominant subspaces (95%) for the two layers of MIL encoder, measured by SAR marczakno2025isoc.
  • ...and 9 more figures