Table of Contents
Fetching ...

Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models

Theo Putterman, Derek Lim, Yoav Gelberg, Stefanie Jegelka, Haggai Maron

TL;DR

This work introduces Learning on LoRAs (LoL), a framework for processing low-rank LoRA weight updates with symmetry-aware models. By exploiting GL$(r)$-invariances and developing GL-equivariant layers, the authors build architectures (notably GL-net) that efficiently and universally approximate GL-invariant functions on LoRA inputs. They construct three large LoRA datasets (CelebA-, Imagenette-, and Qwen2-ARC-LoRA) and demonstrate strong predictive performance for CLIP scores, training-data properties, and LM task metrics, while showing good generalization across unseen ranks. The results highlight practical potential for evaluating, editing, and understanding finetuned models solely from their LoRA weights, enabling privacy-aware analysis and rapid model assessment at scale.

Abstract

Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models, enabling efficient adaptation even with limited computational resources. The resulting proliferation of LoRAs presents exciting opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs. In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models. For instance, an LoL model that takes in LoRA weights as inputs could predict the performance of the finetuned model on downstream tasks, detect potentially harmful finetunes, or even generate novel model edits without traditional training methods. We first identify the inherent parameter symmetries of low rank decompositions of weights, which differ significantly from the parameter symmetries of standard neural networks. To efficiently process LoRA weights, we develop several symmetry-aware invariant or equivariant LoL models, using tools such as canonicalization, invariant featurization, and equivariant layers. We finetune thousands of text-to-image diffusion models and language models to collect datasets of LoRAs. In numerical experiments on these datasets, we show that our LoL architectures are capable of processing low rank weight decompositions to predict CLIP score, finetuning data attributes, finetuning data membership, and accuracy on downstream tasks.

Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models

TL;DR

This work introduces Learning on LoRAs (LoL), a framework for processing low-rank LoRA weight updates with symmetry-aware models. By exploiting GL-invariances and developing GL-equivariant layers, the authors build architectures (notably GL-net) that efficiently and universally approximate GL-invariant functions on LoRA inputs. They construct three large LoRA datasets (CelebA-, Imagenette-, and Qwen2-ARC-LoRA) and demonstrate strong predictive performance for CLIP scores, training-data properties, and LM task metrics, while showing good generalization across unseen ranks. The results highlight practical potential for evaluating, editing, and understanding finetuned models solely from their LoRA weights, enabling privacy-aware analysis and rapid model assessment at scale.

Abstract

Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models, enabling efficient adaptation even with limited computational resources. The resulting proliferation of LoRAs presents exciting opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs. In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models. For instance, an LoL model that takes in LoRA weights as inputs could predict the performance of the finetuned model on downstream tasks, detect potentially harmful finetunes, or even generate novel model edits without traditional training methods. We first identify the inherent parameter symmetries of low rank decompositions of weights, which differ significantly from the parameter symmetries of standard neural networks. To efficiently process LoRA weights, we develop several symmetry-aware invariant or equivariant LoL models, using tools such as canonicalization, invariant featurization, and equivariant layers. We finetune thousands of text-to-image diffusion models and language models to collect datasets of LoRAs. In numerical experiments on these datasets, we show that our LoL architectures are capable of processing low rank weight decompositions to predict CLIP score, finetuning data attributes, finetuning data membership, and accuracy on downstream tasks.
Paper Structure (52 sections, 8 theorems, 38 equations, 5 figures, 8 tables)

This paper contains 52 sections, 8 theorems, 38 equations, 5 figures, 8 tables.

Key Result

Proposition 1

All linear $\mathrm{GL}$-equivariant layers can be written in the form of equation eq:equiv_linear.

Figures (5)

  • Figure 1: Overview of Learning on LoRAs (LoL). A pretrained model $\theta_{\mathrm{base}}$ is finetuned to yield LoRA weight matrices $U_1, V_1, \ldots, U_L, V_L$. These LoRA weights are taken as input to an LoL model $f_\theta$, which can make predictions such as the downstream accuracy of the finetuned model.
  • Figure 2: Architecture of $\mathrm{GL}$-net. Blue boxes are equivariant representations, and red boxes are invariant representations. First, equivariant linear maps lower the dimension of the input. Then our $\mathrm{GL}$ equivariant nonlinearities and more equivariant linear maps process the features. Finally, a matrix multiplication head computes invariant features that are processed by an MLP.
  • Figure 3: Images generated by two diffusion models in our test set. (a), (c), and (e) correspond to images generated by the model predicted by $\mathrm{GL}$-net to have the highest CLIP score. (b), (d), and (f) correspond to outputs of the model predicted by $\mathrm{GL}$-net to have the lowest CLIP score.
  • Figure 4: Performance of LoL models across inputs of varying ranks. Each model is only trained on rank $4$ LoRA weights from CelebA-LoRAs. (Left) Test accuracy on CelebA attribute prediction. (Right) Test loss on CelebA attribute prediction. MLP + Dense and $\mathrm{GL}$-net generalize well to ranks that are unseen during training, but do face degradation at rank one. On the other hand, MLP + SVD does not generalize well.
  • Figure 5: (Left) data preprocessing time for LoL models across 64 inputs of varying sizes. (Right) forward pass time for LoL models across 512 inputs of varying sizes. MLP + Dense runs out of memory for largest inputs.

Theorems & Definitions (14)

  • Proposition 1
  • Theorem 1: Invariance
  • Theorem 2: Universality
  • Lemma 1
  • proof
  • proof : Proof of Proposition \ref{['prop:equiv_layers']}
  • Lemma 2
  • proof
  • Definition C.1: Full rank $\mathrm{GL}$-universality
  • Theorem 3: Formal restatement of Theorem \ref{['thm:mlp_mul_universality']}
  • ...and 4 more