Table of Contents
Fetching ...

Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation

Junghwan Park, Woojin Cho, Junhyuk Heo, Darongsae Kwon, Kookjin Lee

TL;DR

BOLT introduces Basis-Oriented Low-Rank Transfer, a framework to adapt large pre-trained models to unseen tasks without meta-training by constructing layer-wise, orthogonal spectral bases from multiple source-task updates. In the offline phase, dominant singular directions are extracted and orthogonalized to form a shared basis; in the online phase, adaptation is restricted to diagonal coefficients in this fixed basis, yielding a low-parameter, rank-controlled update. The method provides a training-free initialization via pooled spectral coefficients and a parameter-efficient fine-tuning path, achieving robust few-shot, OOD, and test-time adaptation across general and remote-sensing datasets, often outperforming PEFT baselines and meta-learned initializations. Overall, BOLT demonstrates that constraining adaptation to a task-informed orthogonal subspace enables scalable, robust transfer with minimal task-specific parameters.

Abstract

Adapting large pre-trained models to unseen tasks under tight data and compute budgets remains challenging. Meta-learning approaches explicitly learn good initializations, but they require an additional meta-training phase over many tasks, incur high training cost, and can be unstable. At the same time, the number of task-specific pre-trained models continues to grow, yet the question of how to transfer them to new tasks with minimal additional training remains relatively underexplored. We propose BOLT (Basis-Oriented Low-rank Transfer), a framework that reuses existing fine-tuned models not by merging weights, but instead by extracting an orthogonal, task-informed spectral basis and adapting within that subspace. In the offline phase, BOLT collects dominant singular directions from multiple task vectors and orthogonalizes them per layer to form reusable bases. In the online phase, we freeze these bases and train only a small set of diagonal coefficients per layer for the new task, yielding a rank-controlled update with very few trainable parameters. This design provides (i) a strong, training-free initialization for unseen tasks, obtained by pooling source-task coefficients, along with a lightweight rescaling step while leveraging the shared orthogonal bases, and (ii) a parameter-efficient fine-tuning (PEFT) path that, in our experiments, achieves robust performance compared to common PEFT baselines as well as a representative meta-learned initialization. Our results show that constraining adaptation to a task-informed orthogonal subspace provides an effective alternative for unseen-task transfer.

Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation

TL;DR

BOLT introduces Basis-Oriented Low-Rank Transfer, a framework to adapt large pre-trained models to unseen tasks without meta-training by constructing layer-wise, orthogonal spectral bases from multiple source-task updates. In the offline phase, dominant singular directions are extracted and orthogonalized to form a shared basis; in the online phase, adaptation is restricted to diagonal coefficients in this fixed basis, yielding a low-parameter, rank-controlled update. The method provides a training-free initialization via pooled spectral coefficients and a parameter-efficient fine-tuning path, achieving robust few-shot, OOD, and test-time adaptation across general and remote-sensing datasets, often outperforming PEFT baselines and meta-learned initializations. Overall, BOLT demonstrates that constraining adaptation to a task-informed orthogonal subspace enables scalable, robust transfer with minimal task-specific parameters.

Abstract

Adapting large pre-trained models to unseen tasks under tight data and compute budgets remains challenging. Meta-learning approaches explicitly learn good initializations, but they require an additional meta-training phase over many tasks, incur high training cost, and can be unstable. At the same time, the number of task-specific pre-trained models continues to grow, yet the question of how to transfer them to new tasks with minimal additional training remains relatively underexplored. We propose BOLT (Basis-Oriented Low-rank Transfer), a framework that reuses existing fine-tuned models not by merging weights, but instead by extracting an orthogonal, task-informed spectral basis and adapting within that subspace. In the offline phase, BOLT collects dominant singular directions from multiple task vectors and orthogonalizes them per layer to form reusable bases. In the online phase, we freeze these bases and train only a small set of diagonal coefficients per layer for the new task, yielding a rank-controlled update with very few trainable parameters. This design provides (i) a strong, training-free initialization for unseen tasks, obtained by pooling source-task coefficients, along with a lightweight rescaling step while leveraging the shared orthogonal bases, and (ii) a parameter-efficient fine-tuning (PEFT) path that, in our experiments, achieves robust performance compared to common PEFT baselines as well as a representative meta-learned initialization. Our results show that constraining adaptation to a task-informed orthogonal subspace provides an effective alternative for unseen-task transfer.

Paper Structure

This paper contains 55 sections, 24 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Conceptual view of BOLT: Task-vectors $\{\Theta_i - \Theta_0\}_{i=1}^N$ can be represented in a common subspace $\mathcal{X}$ formed by orthogonal bases $\{\mathbf{e}_d\}_{d=1}^{r}$. By reusing the same subspace, the model can adapt to an unseen task vector $\{\Theta_\mathrm{new}-\Theta_0\}$ more quickly.
  • Figure 2: Meta-learning vs. BOLT: few-shot adaptation curves comparing meta-learned initialization and BOLT. Our method reaches higher accuracy in fewer epochs, showing faster convergence than meta-learned initialization.
  • Figure 3: Overall BOLT pipeline: for task vectors $\{\Theta_i\}_{i=1}^N$, we extract layer-wise SVDs and orthogonalize them to obtain shared bases $\{U_{\mathrm{orth}}, V_{\mathrm{orth}}\}$. These fixed bases are later reused to construct weights for an unseen task with only small diagonal parameters.
  • Figure 4: Initialization of the diagonal coefficients: each source task is projected onto $\{U_{\mathrm{orth}}, V_{\mathrm{orth}}\}$ to obtain its best diagonal $\{\mathbf{s}_i^{\ell}\}_{i=1}^N$, and these per-task diagonals are pooled to form $\mathbf{s}_{\mathrm{pool}}^{\ell}$ for a new task. This provides a data-free, task-informed starting point.
  • Figure 5: Few-shot accuracy at 4 and 16 shots using the ViT-B/32 backbone across general and remote-sensing datasets.
  • ...and 2 more figures