Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Yi Xin, Jianjiang Yang, Siqi Luo, Yuntao Du, Qi Qin, Kangrui Cen, Yangfan He, Zhiwei Zhang, Bin Fu, Xiaokang Yang, Guangtao Zhai, Ming-Hsuan Yang, Xiaohong Liu
TL;DR
The paper addresses the impracticality of full fine-tuning for large vision foundation models and surveys parameter-efficient fine-tuning (PEFT) techniques across vision tasks. It categorizes PEFT into addition-, partial-, unified-, and multi-task-tuning, and introduces the V-PEFT Bench with a PPT metric to standardize evaluation. The work provides extensive task/dataset coverage (image, video, dense prediction) and reports benchmark results showing PEFT generally offers favorable efficiency-performance trade-offs, with domain and data scale influencing gains. It concludes with strategies for explainability, PEFT in generative models, hyperparameter simplification, and privacy considerations, aiming to catalyze practical adoption and further research.
Abstract
Pre-trained vision models (PVMs) have demonstrated remarkable adaptability across a wide range of downstream vision tasks, showcasing exceptional performance. However, as these models scale to billions or even trillions of parameters, conventional full fine-tuning has become increasingly impractical due to its high computational and storage demands. To address these challenges, parameter-efficient fine-tuning (PEFT) has emerged as a promising alternative, aiming to achieve performance comparable to full fine-tuning while making minimal adjustments to the model parameters. This paper presents a comprehensive survey of the latest advancements in the visual PEFT field, systematically reviewing current methodologies and categorizing them into four primary categories: addition-based, partial-based, unified-based, and multi-task tuning. In addition, this paper offers an in-depth analysis of widely used visual datasets and real-world applications where PEFT methods have been successfully applied. Furthermore, this paper introduces the V-PEFT Bench, a unified benchmark designed to standardize the evaluation of PEFT methods across a diverse set of vision tasks, ensuring consistency and fairness in comparison. Finally, the paper outlines potential directions for future research to propel advances in the PEFT field. A comprehensive collection of resources is available at https://github.com/synbol/Awesome-Parameter-Efficient-Transfer-Learning.
