Table of Contents
Fetching ...

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Hongxu Chen, Runshi Li, Bowei Zhu, Zhen Wang, Long Chen

TL;DR

IterIS tackles the challenge of merging multiple task-specific LoRAs without gradient-based fine-tuning or access to training data by reframing LoRA merging as an optimization problem solved iteratively via an inference step that yields unified-adapter input features $\tilde{\bm{X}}_i$ and a solving step that updates $W^*$ using $W^* = ( \sum_i \lambda_i \tilde{\bm{X}}_i \tilde{\bm{X}}_i^T )^{-1} ( \sum_i \lambda_i \tilde{\bm{X}}_i \bm{X}_i^T W_i )$, with adaptive weights and a regularization term to reduce sample needs to 1-5%. The method leverages a directed acyclic graph structure to bound iterations and employs a layer-wise update for efficiency, achieving improvements over baselines across text-to-image diffusion, vision-language models, and large language models. By directly using input features for the unified adapters and iteratively refining the objective, IterIS mitigates rough feature assumptions, large unlabeled-sample requirements, and optimization imbalances in prior approaches, enabling private, data-efficient multi-task model composition with practical PEFT impact.

Abstract

Low-rank adaptations (LoRA) are widely used to fine-tune large models across various domains for specific downstream tasks. While task-specific LoRAs are often available, concerns about data privacy and intellectual property can restrict access to training data, limiting the acquisition of a multi-task model through gradient-based training. In response, LoRA merging presents an effective solution by combining multiple LoRAs into a unified adapter while maintaining data privacy. Prior works on LoRA merging primarily frame it as an optimization problem, yet these approaches face several limitations, including the rough assumption about input features utilized in optimization, massive sample requirements, and the unbalanced optimization objective. These limitations can significantly degrade performance. To address these, we propose a novel optimization-based method, named IterIS: 1) We formulate LoRA merging as an advanced optimization problem to mitigate the rough assumption. Additionally, we employ an iterative inference-solving framework in our algorithm. It can progressively refine the optimization objective for improved performance. 2) We introduce an efficient regularization term to reduce the need for massive sample requirements (requiring only 1-5% of the unlabeled samples compared to prior methods). 3) We utilize adaptive weights in the optimization objective to mitigate potential unbalances in LoRA merging process. Our method demonstrates significant improvements over multiple baselines and state-of-the-art methods in composing tasks for text-to-image diffusion, vision-language models, and large language models. Furthermore, our layer-wise algorithm can achieve convergence with minimal steps, ensuring efficiency in both memory and computation.

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

TL;DR

IterIS tackles the challenge of merging multiple task-specific LoRAs without gradient-based fine-tuning or access to training data by reframing LoRA merging as an optimization problem solved iteratively via an inference step that yields unified-adapter input features and a solving step that updates using , with adaptive weights and a regularization term to reduce sample needs to 1-5%. The method leverages a directed acyclic graph structure to bound iterations and employs a layer-wise update for efficiency, achieving improvements over baselines across text-to-image diffusion, vision-language models, and large language models. By directly using input features for the unified adapters and iteratively refining the objective, IterIS mitigates rough feature assumptions, large unlabeled-sample requirements, and optimization imbalances in prior approaches, enabling private, data-efficient multi-task model composition with practical PEFT impact.

Abstract

Low-rank adaptations (LoRA) are widely used to fine-tune large models across various domains for specific downstream tasks. While task-specific LoRAs are often available, concerns about data privacy and intellectual property can restrict access to training data, limiting the acquisition of a multi-task model through gradient-based training. In response, LoRA merging presents an effective solution by combining multiple LoRAs into a unified adapter while maintaining data privacy. Prior works on LoRA merging primarily frame it as an optimization problem, yet these approaches face several limitations, including the rough assumption about input features utilized in optimization, massive sample requirements, and the unbalanced optimization objective. These limitations can significantly degrade performance. To address these, we propose a novel optimization-based method, named IterIS: 1) We formulate LoRA merging as an advanced optimization problem to mitigate the rough assumption. Additionally, we employ an iterative inference-solving framework in our algorithm. It can progressively refine the optimization objective for improved performance. 2) We introduce an efficient regularization term to reduce the need for massive sample requirements (requiring only 1-5% of the unlabeled samples compared to prior methods). 3) We utilize adaptive weights in the optimization objective to mitigate potential unbalances in LoRA merging process. Our method demonstrates significant improvements over multiple baselines and state-of-the-art methods in composing tasks for text-to-image diffusion, vision-language models, and large language models. Furthermore, our layer-wise algorithm can achieve convergence with minimal steps, ensuring efficiency in both memory and computation.

Paper Structure

This paper contains 21 sections, 16 equations, 10 figures, 14 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of methods for multi-task application(a) Retain all LoRAs fine-tuned on task-specific datasets. (b) Train unified adapters using gradient-based methods on mixed datasets for multi-tasking. (c) Create each unified adapter via LoRA merging without labeled data or gradient-based training. Most methods formulate LoRA merging as an optimization problem to align features and solve for each unified adapter.
  • Figure 2: Three key limitations of real-distribution-based merging methods and our improvements. In the scenario of combining the COLA warstadt2019cola and MNLI williams2017broad tasks in NLP, we compare the representative real-distribution-based merging method, RegMean jin2022dataless, with our proposed method: (a) RegMean exhibits increasing discrepancies with deeper encoder layers, while IterIS can fully resolve discrepancies. The value of the "score" metric reflects the performance of the method. (b) Comparison of sample requirements and runtime proportions between our method and RegMean. (c) Proportional magnitudes of terms $T_1$ and $T_2$ in the optimization objective, demonstrating balanced values for our method and imbalance for RegMean.
  • Figure 3: Comparison of linear merging, real-distribution-based merging, and IterIS for LoRA Merging. "OPT." denotes the optimization problem introduced by IterIS. "PTM" denotes the pre-trained model. We define $\frac{B}{A}$ to represent $A^{-1}B$. (a) Linear merging combines each individual LoRA linearly. (b) Real-distribution-based merging computes a closed-form solution based on input features for each LoRA. (c) IterIS acquires all the unified adapters through an iterative inference-solving framework.
  • Figure 4: Qualitative results for multi-concept customization. Target images illustrate single concepts used in composition. (a) Single concept generated by IterIS after composing Cat + Barn. (b) Comparison of pairwise composition across methods. (c) Triple composition examples generated with IterIS.
  • Figure 5: Examples of style caption generated by IterIS.
  • ...and 5 more figures