Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

Minglei Li; Peng Ye; Yongqi Huang; Lin Zhang; Tao Chen; Tong He; Jiayuan Fan; Wanli Ouyang

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang

TL;DR

Adapter-X tackles the efficiency-generalization trade-off in vision PEFT by introducing a Sharing Mixture of Adapters (SMoA) that enables token-level dynamic routing across a shared expert library and across Transformer blocks. It further augments this core with block-specific designs such as per-block normalization and a Prompt Generator to diversify inputs. Empirical results on 2D VTAB-1K and 3D ScanObjectNN/ModelNet40 show that Adapter-X can outperform full fine-tuning while using only a small fraction of trainable parameters (0.20% for 2D and 1.88% for 3D in reported settings), marking a significant efficiency gain without sacrificing accuracy. The work highlights the importance of combining parameter sharing, dynamic allocation, and per-block customization to achieve robust cross-task generalization in PEFT frameworks.

Abstract

Parameter-efficient fine-tuning (PEFT) has become increasingly important as foundation models continue to grow in both popularity and size. Adapter has been particularly well-received due to their potential for parameter reduction and adaptability across diverse tasks. However, striking a balance between high efficiency and robust generalization across tasks remains a challenge for adapter-based methods. We analyze existing methods and find that: 1) parameter sharing is the key to reducing redundancy; 2) more tunable parameters, dynamic allocation, and block-specific design are keys to improving performance. Unfortunately, no previous work considers all these factors. Inspired by this insight, we introduce a novel framework named Adapter-X. First, a Sharing Mixture of Adapters (SMoA) module is proposed to fulfill token-level dynamic allocation, increased tunable parameters, and inter-block sharing at the same time. Second, some block-specific designs like Prompt Generator (PG) are introduced to further enhance the ability of adaptation. Extensive experiments across 2D image and 3D point cloud modalities demonstrate that Adapter-X represents a significant milestone as it is the first to outperform full fine-tuning in both 2D image and 3D point cloud modalities with significantly fewer parameters, i.e., only 0.20% and 1.88% of original trainable parameters for 2D and 3D classification tasks. Our code will be publicly available.

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

TL;DR

Abstract

Paper Structure (19 sections, 7 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 7 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Method
Preliminaries
Sharing MoA
Block-Specific Design
Objective Function
Experiments
Experimental Settings
Experiment on Image
Experiment on Point Cloud
Ablation Studies
Visualization
Conclusion
Pytorch-like code of Sharing Multi-head-MoA
...and 4 more sections

Figures (5)

Figure 1: Comparison of efficiency and performance between our Adapter-X and other methods.
Figure 2: The overview of the proposed Adapter-X. By comprehensively considering parameter sharing, more tunable parameters, dynamic allocation, and block-specific design, the model achieves efficient adaptation to different tasks with minimal trainable parameters.
Figure 3: Partial t-SNE visualization results of our ablation study on the 2D VTAB-1K dataset. Please refer to the appendix for more results.
Figure 4: The distribution of token allocation and routing score of 4 shared experts in different blocks on the CIFAR100 dataset of VTAB-1K.
Figure 5: More t-SNE visualization results of our ablation study on the 2D VTAB-1K dataset.

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

TL;DR

Abstract

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

Authors

TL;DR

Abstract

Table of Contents

Figures (5)