Table of Contents
Fetching ...

AdaRank: Adaptive Rank Pruning for Enhanced Model Merging

Chanhyuk Lee, Jiho Choi, Chanryeol Lee, Donggyun Kim, Seunghoon Hong

TL;DR

AdaRank is proposed, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models and dynamically prunes the singular components that cause interference and offers an optimal amount of information to each task vector by learning to prune ranks during test-time via entropy minimization.

Abstract

Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework, significantly enhancing computational efficiency in multi-task learning. Recently, several SVD-based techniques have been introduced to exploit low-rank structures for enhanced merging, but their reliance on such manually designed rank selection often leads to cross-task interference and suboptimal performance. In this paper, we propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models. We empirically show that the dominant singular components of task vectors can cause critical interference with other tasks, and that naive truncation across tasks and layers degrades performance. In contrast, AdaRank dynamically prunes the singular components that cause interference and offers an optimal amount of information to each task vector by learning to prune ranks during test-time via entropy minimization. Our analysis demonstrates that such method mitigates detrimental overlaps among tasks, while empirical results show that AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.

AdaRank: Adaptive Rank Pruning for Enhanced Model Merging

TL;DR

AdaRank is proposed, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models and dynamically prunes the singular components that cause interference and offers an optimal amount of information to each task vector by learning to prune ranks during test-time via entropy minimization.

Abstract

Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework, significantly enhancing computational efficiency in multi-task learning. Recently, several SVD-based techniques have been introduced to exploit low-rank structures for enhanced merging, but their reliance on such manually designed rank selection often leads to cross-task interference and suboptimal performance. In this paper, we propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models. We empirically show that the dominant singular components of task vectors can cause critical interference with other tasks, and that naive truncation across tasks and layers degrades performance. In contrast, AdaRank dynamically prunes the singular components that cause interference and offers an optimal amount of information to each task vector by learning to prune ranks during test-time via entropy minimization. Our analysis demonstrates that such method mitigates detrimental overlaps among tasks, while empirical results show that AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.

Paper Structure

This paper contains 55 sections, 9 equations, 10 figures, 17 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a), (b) Net change in single-task and multi-task losses for Task Arithmetic ilharco2023editing and CART choi2024revisiting, respectively, when each singular component of a target task vector is individually added to a model merged with full-rank vectors from other tasks. (c) Loss changes of all tasks when adding singular components from the MNIST task vector, with MNIST shown as a dotted line. For clarity, only the top 10% of singular components are shown.
  • Figure 2: Intrinsic rank capturing 95% of total spectral energy in the MLP layer of the first and the last block of ViT-B/32 task vectors obtained from 8 different fine-tuned weights.
  • Figure 3: Parameter count of the merged model when merging ViT-B/32 finetuned on different number of tasks. We show y-axis with a log scale for better visualization.
  • Figure 4: (a) Count of singular component indices selected by AdaRank, cumulated along 8 tasks. The black dashed line denotes the top-16% limit. (b) Comparison of ranks obtained from final masks after applying AdaRank, against the intrinsic rank for the MLP layers in the first (left) and last (right) blocks of ViT-B/32. (c) Performance comparison between merged model with top-$k$ truncation based on intrinsic rank and AdaRank (y-axis clipped for better visualization).
  • Figure 5: Cross-entropy loss during the optimization of $B$ initialized from top-16% truncation (black line). We compare optimizing AdaRank with cross-entropy loss to directly minimize the multi-task loss (blue curve) and entropy as surrogate loss (orange curve).
  • ...and 5 more figures