Table of Contents
Fetching ...

Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation

Devleena Das, Rajeev Patwari, Ashish Sirasao

TL;DR

This work tackles accuracy degradation in deployed language models caused by weight corruption during optimization and serialization. It introduces Recover-LoRA, a lightweight, data-free method that uses synthetic data and logit distillation to train LoRA adapters, aligning a degraded model with its full-precision teacher while updating only a small set of parameters. The approach demonstrates 5–17% average accuracy recovery on diverse SLMs (including MHA and GQA architectures) and shows data- and parameter-efficiency relative to baselines like LLM QAT* and SFT LoRA. Practically, Recover-LoRA offers a deployment-friendly recovery workflow that avoids labeled data and full retraining, with insights on adapter placement and data-source matching for different model architectures.

Abstract

Inference optimizations such as quantization, pruning, format and datatype conversion, model export, and serialization can lead to functional degradations in language model task performance. While most efforts on performance recovery for deployment focus on robust quantization techniques, we focus on recovering model accuracies from any sources that degrade model weights, such as improper model serialization. In this work, we propose Recover-LoRA, a lightweight and dataset agnostic method to recover accuracy in degraded models. Recover-LoRA uses synthetic data and logit distillation to learn LoRA adapters on selective layers that facilitate aligning the degraded model to its full precision model. We investigate the utility of Recover-LoRA across a diverse set of small language models (SLMs), including models with varying attention architectures, multi-head attention (MHA) and group-query attention (GQA), as well as several evaluation datasets. Our results show that Recover-LoRA recovers model accuracies by 5-17% on MHA and GQA SLMs.

Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation

TL;DR

This work tackles accuracy degradation in deployed language models caused by weight corruption during optimization and serialization. It introduces Recover-LoRA, a lightweight, data-free method that uses synthetic data and logit distillation to train LoRA adapters, aligning a degraded model with its full-precision teacher while updating only a small set of parameters. The approach demonstrates 5–17% average accuracy recovery on diverse SLMs (including MHA and GQA architectures) and shows data- and parameter-efficiency relative to baselines like LLM QAT* and SFT LoRA. Practically, Recover-LoRA offers a deployment-friendly recovery workflow that avoids labeled data and full retraining, with insights on adapter placement and data-source matching for different model architectures.

Abstract

Inference optimizations such as quantization, pruning, format and datatype conversion, model export, and serialization can lead to functional degradations in language model task performance. While most efforts on performance recovery for deployment focus on robust quantization techniques, we focus on recovering model accuracies from any sources that degrade model weights, such as improper model serialization. In this work, we propose Recover-LoRA, a lightweight and dataset agnostic method to recover accuracy in degraded models. Recover-LoRA uses synthetic data and logit distillation to learn LoRA adapters on selective layers that facilitate aligning the degraded model to its full precision model. We investigate the utility of Recover-LoRA across a diverse set of small language models (SLMs), including models with varying attention architectures, multi-head attention (MHA) and group-query attention (GQA), as well as several evaluation datasets. Our results show that Recover-LoRA recovers model accuracies by 5-17% on MHA and GQA SLMs.

Paper Structure

This paper contains 31 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Recover-LoRA recovers model accuracy by leveraging logit distillation to align an improper weight serialized model, $M_S$, to its pretrained LLM, $M_{T}$, by learning LoRA adapters, $A$ and $B$, with a synthetically generated dataset $D_{syn}$.
  • Figure 2: Trainable parameters and dataset size comparisons for all recovery methods, showing the parameter and data efficiency of Recover-LoRA.
  • Figure 3: Progression of AR% with increasing dataset size, showing a minimum of 90k synthetic data samples are needed positive AR% in three models.