ColA: Collaborative Adaptation with Gradient Learning

Enmao Diao; Qi Le; Suya Wu; Xinran Wang; Ali Anwar; Jie Ding; Vahid Tarokh

ColA: Collaborative Adaptation with Gradient Learning

Enmao Diao, Qi Le, Suya Wu, Xinran Wang, Ali Anwar, Jie Ding, Vahid Tarokh

TL;DR

ColA with Gradient Learning (GL) tackles the computational bottleneck of fine-tuning large pretrained models by decoupling the gradient computations of hidden representations and adapter parameters and offloading the latter to low-cost devices. The framework proves a theoretical equivalence to classical gradient descent, introduces parameter merging to reduce on-device memory, and demonstrates—across sequence classification, sequence-to-sequence, and causal language modeling benchmarks—that ColA can match or beat PEFT baselines while greatly easing the computation space bottleneck. The FTaaS-oriented design enables multiple users to collaboratively fine-tune adapters without overloading the central GPU, offering a scalable path to personalized, deployable foundation models. Overall, ColA advances efficient, model-agnostic fine-tuning by combining functional gradient descent principles, gradient offloading, and collaborative adapters for practical, large-scale applications.

Abstract

A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.

ColA: Collaborative Adaptation with Gradient Learning

TL;DR

Abstract

Paper Structure (25 sections, 2 theorems, 10 equations, 17 figures, 18 tables, 1 algorithm)

This paper contains 25 sections, 2 theorems, 10 equations, 17 figures, 18 tables, 1 algorithm.

Introduction
Related works
Parameter-Efficient Fine-Tuning (PEFT)
Functional gradient descent
Method
Problem
Fine-Tuning (FT)
Parameter-Efficient Fine-Tuning (PEFT)
Collaborative Adaptation
Gradient Learning (GL)
Fine-Tuning as a Service (FTaaS)
Parameter merging
Experimental Studies
Experimental Setup
Experimental Results
...and 10 more sections

Key Result

Proposition 1

The gradient $\nabla_{w_m}\ell_m(x,y; w_m)$ and $\nabla_{w_m}\mathcal{L}(y, f_\theta(x, \Delta h_{1:M}))$ evaluated at $w_m = w_m^t$ are the same for any $w_m^t$.

Figures (17)

Figure 1: Illustration of the Fine-Tuning as a Service (FTaaS) system architecture. A central server handles both forward and backward passes of the pretrained model. It offloads gradient computations to a low-cost device, namely, Gradient Offloading. Meanwhile, adapters can be trained either on the server or locally by downloading adaptation data. Users of FTaaS can train their adapters independently or collaborate with others if needed.
Figure 2: Learning curves of (a) Linear (b) MLP and (c) CNN with the MNIST dataset of IC task and Accuracy metric.
Figure 3: Learning curves of (a) Linear (b) MLP and (c) CNN with the CIFAR10 dataset of IC task and Accuracy metric.
Figure 4: Ablation studies of adaptation interval $I$ on (a) MNLI (b) SST-2, and (c) MRPC datasets of SC task and GLUE metric.
Figure 5: Ablation studies of adaptation interval $I$ on (a) CoLA (b) QNLI, and (c) QQP datasets of SC task and GLUE metric.
...and 12 more figures

Theorems & Definitions (5)

Proposition 1
Proposition 2
Remark 1
Proof 1: Proof of Proposition \ref{['prop_convergence']}
Proof 2: Proof of Proposition \ref{['prop_merge']}

ColA: Collaborative Adaptation with Gradient Learning

TL;DR

Abstract

ColA: Collaborative Adaptation with Gradient Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (5)