Table of Contents
Fetching ...

Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints

Parameswaran Kamalaruban, Mark Anderson, Stuart Burrell, Maeve Madigan, Piotr Skalski, David Sutton

TL;DR

This work tackles fairness in large pretrained models when sensitivity attributes and their predictors are restricted by privacy constraints. It proposes a distributed, LoRA-based fine-tuning framework that decouples sensitive attribute handling from model development, enabling collaboration between a downstream solution developer and a fairness auditor through adapter exchange alone. Four strategies are explored, including a fairness-unaware baseline (Erm) and three privacy-preserving debiasing methods—Sensitive Unlearning (Unl), Adversarial Training (Adv), and Orthogonality Loss (Orth)—demonstrated on CelebA and UTK-Face using a frozen ViT-Base backbone with LoRA adapters, where $W = W_0 + B A^\top$ and $r \ll \min(d,k)$. The experiments reveal that Orth consistently reduces bias while maintaining or improving utility, whereas Unl and Adv provide more modest gains and can be sensitive to task conditions (e.g., bald predictions). Overall, the results highlight Orth as a robust, privacy-preserving approach for fairness in distributed LoRA fine-tuning with real-world implications for sensitive domains.

Abstract

Pre-trained foundation models can be adapted for specific tasks using Low-Rank Adaptation (LoRA). However, the fairness properties of these adapted classifiers remain underexplored. Existing fairness-aware fine-tuning methods rely on direct access to sensitive attributes or their predictors, but in practice, these sensitive attributes are often held under strict consumer privacy controls, and neither the attributes nor their predictors are available to model developers, hampering the development of fair models. To address this issue, we introduce a set of LoRA-based fine-tuning methods that can be trained in a distributed fashion, where model developers and fairness auditors collaborate without sharing sensitive attributes or predictors. In this paper, we evaluate three such methods - sensitive unlearning, adversarial training, and orthogonality loss - against a fairness-unaware baseline, using experiments on the CelebA and UTK-Face datasets with an ImageNet pre-trained ViT-Base model. We find that orthogonality loss consistently reduces bias while maintaining or improving utility, whereas adversarial training improves False Positive Rate Parity and Demographic Parity in some cases, and sensitive unlearning provides no clear benefit. In tasks where significant biases are present, distributed fairness-aware fine-tuning methods can effectively eliminate bias without compromising consumer privacy and, in most cases, improve model utility.

Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints

TL;DR

This work tackles fairness in large pretrained models when sensitivity attributes and their predictors are restricted by privacy constraints. It proposes a distributed, LoRA-based fine-tuning framework that decouples sensitive attribute handling from model development, enabling collaboration between a downstream solution developer and a fairness auditor through adapter exchange alone. Four strategies are explored, including a fairness-unaware baseline (Erm) and three privacy-preserving debiasing methods—Sensitive Unlearning (Unl), Adversarial Training (Adv), and Orthogonality Loss (Orth)—demonstrated on CelebA and UTK-Face using a frozen ViT-Base backbone with LoRA adapters, where and . The experiments reveal that Orth consistently reduces bias while maintaining or improving utility, whereas Unl and Adv provide more modest gains and can be sensitive to task conditions (e.g., bald predictions). Overall, the results highlight Orth as a robust, privacy-preserving approach for fairness in distributed LoRA fine-tuning with real-world implications for sensitive domains.

Abstract

Pre-trained foundation models can be adapted for specific tasks using Low-Rank Adaptation (LoRA). However, the fairness properties of these adapted classifiers remain underexplored. Existing fairness-aware fine-tuning methods rely on direct access to sensitive attributes or their predictors, but in practice, these sensitive attributes are often held under strict consumer privacy controls, and neither the attributes nor their predictors are available to model developers, hampering the development of fair models. To address this issue, we introduce a set of LoRA-based fine-tuning methods that can be trained in a distributed fashion, where model developers and fairness auditors collaborate without sharing sensitive attributes or predictors. In this paper, we evaluate three such methods - sensitive unlearning, adversarial training, and orthogonality loss - against a fairness-unaware baseline, using experiments on the CelebA and UTK-Face datasets with an ImageNet pre-trained ViT-Base model. We find that orthogonality loss consistently reduces bias while maintaining or improving utility, whereas adversarial training improves False Positive Rate Parity and Demographic Parity in some cases, and sensitive unlearning provides no clear benefit. In tasks where significant biases are present, distributed fairness-aware fine-tuning methods can effectively eliminate bias without compromising consumer privacy and, in most cases, improve model utility.

Paper Structure

This paper contains 23 sections, 7 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Collaborative debiasing of pre-trained foundation models under demographic privacy constraints.
  • Figure 2: Erm: Fine-tune the pre-trained model for the downstream task.
  • Figure 3: Unl: Fine-tune the pre-trained model for sensitive attribute prediction, debias it by "unlearning" this capability, and then perform downstream fine-tuning.
  • Figure 4: Adv: Jointly fine-tune for the downstream and sensitive tasks using an alternating optimization strategy that maximizes task performance while minimizing sensitive attribute predictability.
  • Figure 5: Orth: Apply an orthogonality regularizer during downstream fine-tuning to enforce decorrelation between learned representations and sensitive features.
  • ...and 9 more figures