Low-rank finetuning for LLMs: A fairness perspective

Saswat Das; Marco Romanelli; Cuong Tran; Zarreen Reza; Bhavya Kailkhura; Ferdinando Fioretto

Low-rank finetuning for LLMs: A fairness perspective

Saswat Das, Marco Romanelli, Cuong Tran, Zarreen Reza, Bhavya Kailkhura, Ferdinando Fioretto

TL;DR

This work analyzes whether low-rank fine-tuning (LoRA) can adequately learn distribution shifts during task-specific adaptation of LLMs from a fairness standpoint. By comparing LoRA and full fine-tuning across toxicity mitigation and sequential decision tasks, using tools such as LogitLens and KL-divergence of token posteriors, it shows that low-rank updates can retain harmful biases and toxicity from the baseline model, especially at smaller ranks, and that the degree of adaptation scales with the LoRA rank. The study finds that higher LoRA ranks more closely resemble full fine-tuning but still risk preserving or amplifying unfair decision boundaries in sequential tasks, indicating a trade-off between efficiency and fairness. These findings underscore the need for careful evaluation of LoRA-based fine-tuning for safety and societal impact, and suggest that rank selection and alternative strategies may be necessary to ensure responsible LLM deployments. $P_eta(y|x)$ and other quantities are analyzed under variations of rank $r$ and dimensionality to illustrate how information from fine-tuning data propagates through the model.

Abstract

Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models (LLMs) due to their reduced computational and memory requirements. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. Our findings reveal that there are cases in which low-rank fine-tuning falls short in learning such shifts. This, in turn, produces non-negligible side effects, especially when fine-tuning is adopted for toxicity mitigation in pre-trained models, or in scenarios where it is important to provide fair models. Through comprehensive empirical evidence on several models, datasets, and tasks, we show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors. We also show that this extends to sequential decision-making tasks, emphasizing the need for careful evaluation to promote responsible LLMs development.

Low-rank finetuning for LLMs: A fairness perspective

TL;DR

and other quantities are analyzed under variations of rank

and dimensionality to illustrate how information from fine-tuning data propagates through the model.

Abstract

Paper Structure (29 sections, 6 equations, 12 figures, 1 table)

This paper contains 29 sections, 6 equations, 12 figures, 1 table.

Introduction
Preliminaries
LoRA finetuning.
Fairness.
Experimental Setup
Datasets and settings
Models
Metrics
Results
Fine-tuning for toxicity and stereotype mitigation
Lower ranks may retain the bias of the original model.
Fine-tuning for sequential decisions
Potential disparate impacts in LoRA fine-tuning.
Why rank matters: the influence of LoRA on model adaptability
Related work
...and 14 more sections

Figures (12)

Figure 1: *ll analysis of the generation process using the prompt "she should work as a" for the baseline model (OPT 1.3B), several lora fine-tuned models with different ranks, and the fully fine-tuned model. The higher the rank, the more the lora fine-tuned models "diverge" from the toxic behaviour of the baseline, capturing the traits of the fine-tuning datasets used for mitigation.
Figure 2: Toxicity and stereotype assessment: Toxic (orange) and non-toxic (blue) completions for a set of prompts on gender and sexuality reported for various version of Llama-2 7B (left) and OPT 1.3B (right). From left to right: Original model, LoRA fine-tuned model with ranks, 2, 4, 8, 32, and 64, and the regular fine-tuned model.
Figure 3: LogitLens analysis on various Llama-2 7B models. From left to right, Original model, LoRA fine-tuning with ranks 2, 16, 32, and 64, and the regular fine-tuning model.
Figure 4: Disparate impact of fine-tuning with lora on sentence classification task for IMDb (left) and SST2 (right) datasets, when the model penalizes some classes or groups more than others. The $y$-axis is group accuracy, while the $x$-axis is the size of the minority group as a proportion of the majority group at different levels of downsampling. The underlying pre-trained model is GPT-2 fine-tuned for 5 epochs.
Figure 5: Decision boundary analysis.
...and 7 more figures

Low-rank finetuning for LLMs: A fairness perspective

TL;DR

Abstract

Low-rank finetuning for LLMs: A fairness perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (12)