Table of Contents
Fetching ...

Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models

Liyi Zhang, Jake Snell, Thomas L. Griffiths

TL;DR

The paper introduces ABMLL, a scalable Amortized Bayesian Meta-Learning approach for LoRA-tuned large language models, enabling task-conditioned uncertainty modeling without per-task parameter copies. By expressing both global and task-specific weights with LoRA adapters and employing a variational Bayesian objective with a beta-balanced reconstruction term, ABMLL delivers improved generalization to unseen tasks and better uncertainty calibration on large models like Llama3-8B. Empirical results on CrossFit and UnifiedQA show ABMLL outperforming standard LoRA and other meta-learning baselines in accuracy and ECE, while remaining memory-efficient and robust to pruning. The work bridges Bayesian methods and LLM fine-tuning, highlighting the potential for inductive bias and reliable uncertainty estimation in scalable meta-learning for large models.

Abstract

Fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, it is often unclear how well the fine-tuned LLM will generalize, i.e., how well it will perform on unseen datasets. Methods have been proposed to improve generalization by optimizing in-context prompts, or by using meta-learning to fine-tune LLMs. However, these methods are expensive in memory and computation, requiring either long-context prompts or saving copies of parameters and using second-order gradient updates. To address these challenges, we propose Amortized Bayesian Meta-Learning for LoRA (ABMLL). This method builds on amortized Bayesian meta-learning for smaller models, adapting this approach to LLMs while maintaining its computational efficiency. We reframe task-specific and global parameters in the context of LoRA and use a new hyperparameter to balance reconstruction accuracy and the fidelity of task-specific parameters to the global ones. ABMLL provides effective generalization and scales to large models such as LLAMA3-8B. Furthermore, as a result of using a Bayesian framework, ABMLL provides improved uncertainty quantification. We test ABMLL on CrossFit and Unified-QA datasets and find that it outperforms existing methods on these benchmarks in terms of both accuracy and expected calibration error.

Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models

TL;DR

The paper introduces ABMLL, a scalable Amortized Bayesian Meta-Learning approach for LoRA-tuned large language models, enabling task-conditioned uncertainty modeling without per-task parameter copies. By expressing both global and task-specific weights with LoRA adapters and employing a variational Bayesian objective with a beta-balanced reconstruction term, ABMLL delivers improved generalization to unseen tasks and better uncertainty calibration on large models like Llama3-8B. Empirical results on CrossFit and UnifiedQA show ABMLL outperforming standard LoRA and other meta-learning baselines in accuracy and ECE, while remaining memory-efficient and robust to pruning. The work bridges Bayesian methods and LLM fine-tuning, highlighting the potential for inductive bias and reliable uncertainty estimation in scalable meta-learning for large models.

Abstract

Fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, it is often unclear how well the fine-tuned LLM will generalize, i.e., how well it will perform on unseen datasets. Methods have been proposed to improve generalization by optimizing in-context prompts, or by using meta-learning to fine-tune LLMs. However, these methods are expensive in memory and computation, requiring either long-context prompts or saving copies of parameters and using second-order gradient updates. To address these challenges, we propose Amortized Bayesian Meta-Learning for LoRA (ABMLL). This method builds on amortized Bayesian meta-learning for smaller models, adapting this approach to LLMs while maintaining its computational efficiency. We reframe task-specific and global parameters in the context of LoRA and use a new hyperparameter to balance reconstruction accuracy and the fidelity of task-specific parameters to the global ones. ABMLL provides effective generalization and scales to large models such as LLAMA3-8B. Furthermore, as a result of using a Bayesian framework, ABMLL provides improved uncertainty quantification. We test ABMLL on CrossFit and Unified-QA datasets and find that it outperforms existing methods on these benchmarks in terms of both accuracy and expected calibration error.

Paper Structure

This paper contains 33 sections, 9 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: Illustrations of ABMLL and LoRA. There are $M$ tasks with $N$ datapoints each. $x$ is a prompt, $y$ is its output, and superscripts $S$ and $Q$ refer to the support set and the query set, which can be considered as train and test sets for individual tasks. Each solid arrow refers to a probabilistic relationship. On the graphical model shown on the left, a dashed arrow is a variational approximation; on the workflows shown to the right, a dashed arrow is an arithmetic operation.
  • Figure 2: cls-45 validation accuracy and ECE over epochs across our method (ABMLL) and four benchmarks. Values are computed as sliding-window moving average over the three most recent epochs. ABMLL achieves consistent performance on both metrics.