Table of Contents
Fetching ...

PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

Nusrat Jahan Prottasha, Upama Roy Chowdhury, Shetu Mohanto, Tasfia Nuzhat, Abdullah As Sami, Md Shamol Ali, Md Shohanur Islam Sobuj, Hafijur Raman, Md Kowsher, Ozlem Ozmen Garibay

TL;DR

This survey analyzes the resource and fine-tuning challenges of large language, vision, and multimodal models and advocates parameter-efficient fine-tuning (PEFT) as a scalable solution. It introduces a unified taxonomy—additive, selective, reparameterized, hybrid, and unified approaches—and details design considerations (quantization, routing, memory, KV-cache, pruning, energy, multimodal). Through cross-domain evaluation (NLP, vision, multimodal, and robotics), it shows that PEFT methods like LoRA, adapters, RoCoFT, Propulsion, and SK-Tuning can approach or surpass full fine-tuning performance with far fewer trainable parameters. The paper also discusses open challenges (interpretability, theory, benchmarks, privacy, hardware considerations) and outlines future directions, including federated and continual learning, to broaden PEFT’s practical impact. Overall, PEFT emerges as a practical, scalable pathway to democratize the deployment of massive foundation models while curbing computational and environmental costs.

Abstract

Large models such as Large Language Models (LLMs) and Vision Language Models (VLMs) have transformed artificial intelligence, powering applications in natural language processing, computer vision, and multimodal learning. However, fully fine-tuning these models remains expensive, requiring extensive computational resources, memory, and task-specific data. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a promising solution that allows adapting large models to downstream tasks by updating only a small portion of parameters. This survey presents a comprehensive overview of PEFT techniques, focusing on their motivations, design principles, and effectiveness. We begin by analyzing the resource and accessibility challenges posed by traditional fine-tuning and highlight key issues, such as overfitting, catastrophic forgetting, and parameter inefficiency. We then introduce a structured taxonomy of PEFT methods -- grouped into additive, selective, reparameterized, hybrid, and unified frameworks -- and systematically compare their mechanisms and trade-offs. Beyond taxonomy, we explore the impact of PEFT across diverse domains, including language, vision, and generative modeling, showing how these techniques offer strong performance with lower resource costs. We also discuss important open challenges in scalability, interpretability, and robustness, and suggest future directions such as federated learning, domain adaptation, and theoretical grounding. Our goal is to provide a unified understanding of PEFT and its growing role in enabling practical, efficient, and sustainable use of large models.

PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

TL;DR

This survey analyzes the resource and fine-tuning challenges of large language, vision, and multimodal models and advocates parameter-efficient fine-tuning (PEFT) as a scalable solution. It introduces a unified taxonomy—additive, selective, reparameterized, hybrid, and unified approaches—and details design considerations (quantization, routing, memory, KV-cache, pruning, energy, multimodal). Through cross-domain evaluation (NLP, vision, multimodal, and robotics), it shows that PEFT methods like LoRA, adapters, RoCoFT, Propulsion, and SK-Tuning can approach or surpass full fine-tuning performance with far fewer trainable parameters. The paper also discusses open challenges (interpretability, theory, benchmarks, privacy, hardware considerations) and outlines future directions, including federated and continual learning, to broaden PEFT’s practical impact. Overall, PEFT emerges as a practical, scalable pathway to democratize the deployment of massive foundation models while curbing computational and environmental costs.

Abstract

Large models such as Large Language Models (LLMs) and Vision Language Models (VLMs) have transformed artificial intelligence, powering applications in natural language processing, computer vision, and multimodal learning. However, fully fine-tuning these models remains expensive, requiring extensive computational resources, memory, and task-specific data. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a promising solution that allows adapting large models to downstream tasks by updating only a small portion of parameters. This survey presents a comprehensive overview of PEFT techniques, focusing on their motivations, design principles, and effectiveness. We begin by analyzing the resource and accessibility challenges posed by traditional fine-tuning and highlight key issues, such as overfitting, catastrophic forgetting, and parameter inefficiency. We then introduce a structured taxonomy of PEFT methods -- grouped into additive, selective, reparameterized, hybrid, and unified frameworks -- and systematically compare their mechanisms and trade-offs. Beyond taxonomy, we explore the impact of PEFT across diverse domains, including language, vision, and generative modeling, showing how these techniques offer strong performance with lower resource costs. We also discuss important open challenges in scalability, interpretability, and robustness, and suggest future directions such as federated learning, domain adaptation, and theoretical grounding. Our goal is to provide a unified understanding of PEFT and its growing role in enabling practical, efficient, and sustainable use of large models.

Paper Structure

This paper contains 61 sections, 45 equations, 18 figures, 14 tables.

Figures (18)

  • Figure 1: Overview of key PEFT techniques: Adapter, Prefix Tuning, LoRA, Parallel Adapter, and Scaled Parallel Adapter he2021towards
  • Figure 2: PEFT Categorized. A comprehensive taxonomy of Parameter-Efficient Fine-Tuning (PEFT) methods. The diagram illustrates the hierarchical organization of PEFT techniques into five major branches: Additive Fine Tuning (with Adapter-Based and Soft Prompt-Based methods), Selective Fine Tuning (parameter-based, unstructured parameter-based, and structured approaches), Reparameterized PEFT (including low-rank decomposition, adaptive rank methods, and Lora variants), Hybrid Approach, and MoE-based methods. Each branch further subdivides into specific implementation strategies and variants. The taxonomy highlights the diverse approaches to achieving parameter efficiency while maintaining model performance across various adaptation scenarios.
  • Figure 3: Left: Parallel adapter implementation across visual and language encoders with cross-modal connections. Right: Unified adapter structure with modality-specific up-projections feeding into a shared down-projection pathway.
  • Figure 4: Comparison of serial adapter integration in Transformer architecture (left) and adapter layer structure (middle) and Hybrid adapter architecture(right)
  • Figure 5: Vision Transformer (ViT) with adapter modules: (a) Standard ViT architecture, (b) ViT-Adapter framework with injector-extractor modules, (c) Spatial Prior Module, (d) Spatial Feature Injector with cross-attention, and (e) Multi-Scale Feature Extractor. The design supports single-task applications including detection and segmentation.
  • ...and 13 more figures