Table of Contents
Fetching ...

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

Sten Rüdiger, Sebastian Raschka

Abstract

Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decomposition to identify subspaces related to minor singular vectors associated with the least significant singular values and constrains the update of parameters during fine-tuning to those directions. This strategy leads to up to 5.9x improvement in knowledge acquisition under optimized training hyperparameters and a minimal parameter footprint of 6-60% compared to LoRA. These results suggest that constraining adaptation to minor singular directions provides a more efficient and stable mechanism for integrating new knowledge into pre-trained language models.

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

Abstract

Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decomposition to identify subspaces related to minor singular vectors associated with the least significant singular values and constrains the update of parameters during fine-tuning to those directions. This strategy leads to up to 5.9x improvement in knowledge acquisition under optimized training hyperparameters and a minimal parameter footprint of 6-60% compared to LoRA. These results suggest that constraining adaptation to minor singular directions provides a more efficient and stable mechanism for integrating new knowledge into pre-trained language models.

Paper Structure

This paper contains 45 sections, 5 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: (a) Diagram of a LoRA-style low-rank adaptation module applied to a weight matrix. (b) Diagram of a MiCA-style adaptation module illustrating its distinct update structure compared to LoRA (blue: constrained components, orange: fine-tuned components initialized as indicated in each trapezoid). $U[:, -r:]$ means all rows, last $r$ columns of the matrix $U$ defined in the text.
  • Figure 2: Retention of domain knowledge from the BLOGS dataset as a function of number of training epochs for MiCA and LoRA. Error bars denote the standard error of 8 evaluation runs. The horizontal dashed line shows the baseline performance of the respective foundation model. The green dashed line in (b) shows results for a fixed $B$ matrix initialized with random SVD components instead of the minor componenents and same hyperparameters as for the MiCA runs.
  • Figure 3: Retention of domain knowledge for the history book and for several fine-tuning methods for Llama-2.