Table of Contents
Fetching ...

FPS: Feedforward-based Parameter Selection For Efficient Fine-Tuning

Kenneth Yang, Wen-Li Wei, Jen-Chun Lin

TL;DR

This work tackles the memory bottleneck of selecting tunable parameters in selection-based PEFT for large pre-trained models. It introduces FPS, a gradient-free method that computes parameter importance during a single forward pass via $I(w) = \mathbb{E}_{x \sim D}[ |w| \cdot |a^{(i-1)}_k(x)| ]$ and enforces a sparsity constraint $\|\theta - \theta_0\|_0 \le k$. Empirically, FPS matches the accuracy of the state-of-the-art GPS on 24 FGVC/VTAB-1k tasks while reducing peak memory by nearly 9x and cutting the parameter-selection time by about 2x, all without introducing additional architectural components. This yields a practical, scalable approach for adapting large foundation models with minimal engineering and computational overhead.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Addition-based methods, such as Adapters [1], introduce inference latency and engineering complexity, while selection-based methods like Gradient-based Parameter Selection (GPS) [2] require a full backward pass, which results in the same peak memory usage as full fine-tuning. To address this dilemma, we propose Feedforward-based Parameter Selection (FPS), a gradient-free method that identifies an optimal parameter subset in a single forward pass. FPS ranks parameters by the product of their magnitudes and corresponding input activations, leveraging both pre-trained knowledge and downstream data. Evaluated on $24$ visual tasks from FGVC and VTAB-1k, FPS achieves performance comparable to state-of-the-art methods while reducing peak memory usage by nearly $9 \times$ and accelerating parameter selection by about $2 \times$, offering a genuinely memory-efficient and practical solution for fine-tuning large-scale pre-trained models.

FPS: Feedforward-based Parameter Selection For Efficient Fine-Tuning

TL;DR

This work tackles the memory bottleneck of selecting tunable parameters in selection-based PEFT for large pre-trained models. It introduces FPS, a gradient-free method that computes parameter importance during a single forward pass via and enforces a sparsity constraint . Empirically, FPS matches the accuracy of the state-of-the-art GPS on 24 FGVC/VTAB-1k tasks while reducing peak memory by nearly 9x and cutting the parameter-selection time by about 2x, all without introducing additional architectural components. This yields a practical, scalable approach for adapting large foundation models with minimal engineering and computational overhead.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Addition-based methods, such as Adapters [1], introduce inference latency and engineering complexity, while selection-based methods like Gradient-based Parameter Selection (GPS) [2] require a full backward pass, which results in the same peak memory usage as full fine-tuning. To address this dilemma, we propose Feedforward-based Parameter Selection (FPS), a gradient-free method that identifies an optimal parameter subset in a single forward pass. FPS ranks parameters by the product of their magnitudes and corresponding input activations, leveraging both pre-trained knowledge and downstream data. Evaluated on visual tasks from FGVC and VTAB-1k, FPS achieves performance comparable to state-of-the-art methods while reducing peak memory usage by nearly and accelerating parameter selection by about , offering a genuinely memory-efficient and practical solution for fine-tuning large-scale pre-trained models.

Paper Structure

This paper contains 14 sections, 6 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the proposed Feedforward-based Parameter Selection (FPS). FPS overcomes the inefficiency of gradient-based selection by computing parameter importance on-the-fly in a single forward pass via the product of the magnitudes of weights and input activations.
  • Figure 2: (a) Peak GPU memory usage and (b) Parameter selection latency on the FGVC benchmark.