Table of Contents
Fetching ...

Variational Low-Rank Adaptation Using IVON

Bai Cong, Nico Daheim, Yuesong Shen, Daniel Cremers, Rio Yokota, Mohammad Emtiyaz Khan, Thomas Möllenhoff

TL;DR

This work replaces AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models and provides additional evidence for the effectiveness of IVON for large language models.

Abstract

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models. The code is available at https://github.com/team-approx-bayes/ivon-lora.

Variational Low-Rank Adaptation Using IVON

TL;DR

This work replaces AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models and provides additional evidence for the effectiveness of IVON for large language models.

Abstract

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models. The code is available at https://github.com/team-approx-bayes/ivon-lora.

Paper Structure

This paper contains 4 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Interpolation between 'IVON@mean' and 'IVON' enables us to trade-off accuracy for better calibration at test time. Essentially, we use $\hbox{${\cal N}$}(\hbox{$\hbox{$\boldsymbol{\theta}$}$} \,|\, \hbox{$\hbox{$\mathbf{m}$}$}, \text{diag}(\tau\hbox{$\hbox{$\mathbf{v}$}$}))$ with a scalar $\tau \in [0,1]$. For $\tau=0$, we recover IVON@mean (leftmost marker) and, for $\tau = 1$, we recover IVON (rightmost marker). Generally, as $\tau$ is increased, the error increases while the NLL decreases. The trend is consistent across datasets (with a few minor exceptions). Metrics are averaged over 3 runs.
  • Figure 2: The training speeds of IVON and AdamW are similar. We plot validation accuracies of the two methods versus time in hours. Results are averaged over 3 runs.