Table of Contents
Fetching ...

IPA: An Information-Reconstructive Input Projection Framework for Efficient Foundation Model Adaptation

Yuan Yin, Shashanka Venkataramanan, Tuan-Hung Vu, Andrei Bursuc, Matthieu Cord

TL;DR

IPA reframes parameter-efficient adaptation by replacing the data-agnostic LoRA down-projection with a feature-aware input projection that preserves input information in a reduced space. It introduces a forward-only, information-reconstruction objective (via a linear P and its decoder Q) and instantiates it with Incremental PCA to pretrain the projector efficiently. Empirically, IPA consistently outperforms random projection baselines across language and vision-language benchmarks and can match full LoRA performance with roughly half the trainable parameters when the projector is frozen. The approach yields robust improvements with modest pretraining overhead and offers a practical path to more efficient foundation-model adaptation.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce adaptation cost by injecting low-rank updates into pretrained weights. However, LoRA's down-projection is randomly initialized and data-agnostic, discarding potentially useful information. Prior analyses show that this projection changes little during training, while the up-projection carries most of the adaptation, making the random input compression a performance bottleneck. We propose IPA, a feature-aware projection framework that explicitly aims to reconstruct the original input within a reduced hidden space. In the linear case, we instantiate IPA with algorithms approximating top principal components, enabling efficient projector pretraining with negligible inference overhead. Across language and vision benchmarks, IPA consistently improves over LoRA and DoRA, achieving on average 1.5 points higher accuracy on commonsense reasoning and 2.3 points on VTAB-1k, while matching full LoRA performance with roughly half the trainable parameters when the projection is frozen. Code available at https://github.com/valeoai/peft-ipa .

IPA: An Information-Reconstructive Input Projection Framework for Efficient Foundation Model Adaptation

TL;DR

IPA reframes parameter-efficient adaptation by replacing the data-agnostic LoRA down-projection with a feature-aware input projection that preserves input information in a reduced space. It introduces a forward-only, information-reconstruction objective (via a linear P and its decoder Q) and instantiates it with Incremental PCA to pretrain the projector efficiently. Empirically, IPA consistently outperforms random projection baselines across language and vision-language benchmarks and can match full LoRA performance with roughly half the trainable parameters when the projector is frozen. The approach yields robust improvements with modest pretraining overhead and offers a practical path to more efficient foundation-model adaptation.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce adaptation cost by injecting low-rank updates into pretrained weights. However, LoRA's down-projection is randomly initialized and data-agnostic, discarding potentially useful information. Prior analyses show that this projection changes little during training, while the up-projection carries most of the adaptation, making the random input compression a performance bottleneck. We propose IPA, a feature-aware projection framework that explicitly aims to reconstruct the original input within a reduced hidden space. In the linear case, we instantiate IPA with algorithms approximating top principal components, enabling efficient projector pretraining with negligible inference overhead. Across language and vision benchmarks, IPA consistently improves over LoRA and DoRA, achieving on average 1.5 points higher accuracy on commonsense reasoning and 2.3 points on VTAB-1k, while matching full LoRA performance with roughly half the trainable parameters when the projection is frozen. Code available at https://github.com/valeoai/peft-ipa .

Paper Structure

This paper contains 43 sections, 6 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Cosine‐similarity matrices for LoRA and full‐fine‐tune updates on 27 BIG-Bench Hard tasks. (a) shows the similarity between each trained LoRA-$A$ vector and its initialization; panels (b)–(d) show pairwise task–task similarities for LoRA-$A$, LoRA-$B$, and full fine-tune updates, respectively. The LoRA-$A$ vectors remain close to their shared initialization in (a) and vary little across tasks in (b), while the task-dependent patterns in LoRA-$B$ (c) closely match those from full fine-tuning (d).
  • Figure 2: Schematic of IPA vs. standard LoRA. The gray arrows denote mappings between two vector feature spaces. In standard LoRA, an input feature $x$ is projected to a low-dimensional space by $f_A$ and then lifted back by $f_B$ to yield the update $\Delta z$. IPA introduces two pretrained projectors, $\mathcal{P}$ and $\mathcal{Q}$, enforcing that the hidden feature can reconstruct $x$; at adaptation time only $\mathcal{P}$ is retained.
  • Figure 3: Comparison of IPA with baselines in both settings, with ($\bigcirc$) and without ($\otimes$) finetunable feature projection on the commonsense benchmark. The dotted red line marks the highest baseline performance.
  • Figure 4: Average accuracy of Llama-3 8B models fine-tuned on commonsense benchmark with (a) varying hidden dimension $d_h$ for IPA, compared to LoRA and DoRA, both with input projection fine-tuning •••, and (b) IPA (with projection fine-tuning • or without ×) with varying percentage of the training dataset to obtain the projection pretraining feature set.
  • Figure 5: Cosine similarity of LoRA $A$ projections after fine-tuning on 27 BBH tasks, compared to the initial projection, versus the similarity of projected features. To capture the local behavior of the projection, we complement \ref{['fig:cosine_dist_lora_fft_bbh']}a with a measure of feature-wise similarity between the projected representations.