ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Sinan Du, Guosheng Zhang, Keyao Wang, Yuanrui Wang, Haixiao Yue, Gang Zhang, Errui Ding, Jingdong Wang, Zhengzhuo Xu, Chun Yuan
TL;DR
ALoRE tackles the challenge of efficiently adapting large vision models by aggregating multiple low-rank experts in a Kronecker-product–based hypercomplex space, forming a multi-branch architecture that decouples learned patterns while keeping parameter growth negligible. The method uses purely linear transformations and sequential re-parameterization to merge into the backbone, avoiding inference latency. Across 24 downstream tasks and multiple backbones, ALoRE consistently outperforms full fine-tuning and existing PETL methods with minimal trainable parameters, and ablations confirm the importance of bottleneck size, expert count, and placement. Visualizations corroborate that different experts specialize in complementary visual cues, supporting the decoupling of features and enhanced adaptation efficiency. The work offers a scalable, practical PETL solution with clear implications for multi-task learning and resource-constrained deployment of large vision models.
Abstract
Parameter-efficient transfer learning (PETL) has become a promising paradigm for adapting large-scale vision foundation models to downstream tasks. Typical methods primarily leverage the intrinsic low rank property to make decomposition, learning task-specific weights while compressing parameter size. However, such approaches predominantly manipulate within the original feature space utilizing a single-branch structure, which might be suboptimal for decoupling the learned representations and patterns. In this paper, we propose ALoRE, a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts using a multi-branch paradigm, disentangling the learned cognitive patterns during training. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone via re-parameterization in a sequential manner, avoiding additional inference latency. We conduct extensive experiments on 24 image classification tasks using various backbone variants. Experimental results demonstrate that ALoRE outperforms the full fine-tuning strategy and other state-of-the-art PETL methods in terms of performance and parameter efficiency. For instance, ALoRE obtains 3.06% and 9.97% Top-1 accuracy improvement on average compared to full fine-tuning on the FGVC datasets and VTAB-1k benchmark by only updating 0.15M parameters.
