FPS: Feedforward-based Parameter Selection For Efficient Fine-Tuning
Kenneth Yang, Wen-Li Wei, Jen-Chun Lin
TL;DR
This work tackles the memory bottleneck of selecting tunable parameters in selection-based PEFT for large pre-trained models. It introduces FPS, a gradient-free method that computes parameter importance during a single forward pass via $I(w) = \mathbb{E}_{x \sim D}[ |w| \cdot |a^{(i-1)}_k(x)| ]$ and enforces a sparsity constraint $\|\theta - \theta_0\|_0 \le k$. Empirically, FPS matches the accuracy of the state-of-the-art GPS on 24 FGVC/VTAB-1k tasks while reducing peak memory by nearly 9x and cutting the parameter-selection time by about 2x, all without introducing additional architectural components. This yields a practical, scalable approach for adapting large foundation models with minimal engineering and computational overhead.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Addition-based methods, such as Adapters [1], introduce inference latency and engineering complexity, while selection-based methods like Gradient-based Parameter Selection (GPS) [2] require a full backward pass, which results in the same peak memory usage as full fine-tuning. To address this dilemma, we propose Feedforward-based Parameter Selection (FPS), a gradient-free method that identifies an optimal parameter subset in a single forward pass. FPS ranks parameters by the product of their magnitudes and corresponding input activations, leveraging both pre-trained knowledge and downstream data. Evaluated on $24$ visual tasks from FGVC and VTAB-1k, FPS achieves performance comparable to state-of-the-art methods while reducing peak memory usage by nearly $9 \times$ and accelerating parameter selection by about $2 \times$, offering a genuinely memory-efficient and practical solution for fine-tuning large-scale pre-trained models.
