Table of Contents
Fetching ...

Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness and Efficiency

Yixuan Zhang, Feng Zhou

TL;DR

The paper addresses fairness during fine-tuning of large pre-trained models, showing that bias can emerge when adapting to new tasks even from fair baselines. It introduces weight importance neutralization based on Fisher information across demographic groups and couples it with a weighted low-rank SVD to compress the linear head, enabling fairer and faster adaptation. Experiments on Adult, CelebA, and LFW+a demonstrate improvements in both predictive accuracy and bias metrics such as Demographic Parity and Equalized Odds, with notable efficiency benefits from the low-rank approach. The proposed method remains effective even when the pre-trained model is fair, offering a practical strategy for robust fairness in downstream tasks without requiring explicit reformulation of fairness criteria.

Abstract

Fine-tuning pre-trained models is a widely employed technique in numerous real-world applications. However, fine-tuning these models on new tasks can lead to unfair outcomes. This is due to the absence of generalization guarantees for fairness properties, regardless of whether the original pre-trained model was developed with fairness considerations. To tackle this issue, we introduce an efficient and robust fine-tuning framework specifically designed to mitigate biases in new tasks. Our empirical analysis shows that the parameters in the pre-trained model that affect predictions for different demographic groups are different, so based on this observation, we employ a transfer learning strategy that neutralizes the importance of these influential weights, determined using Fisher information across demographic groups. Additionally, we integrate this weight importance neutralization strategy with a matrix factorization technique, which provides a low-rank approximation of the weight matrix using fewer parameters, reducing the computational demands. Experiments on multiple pre-trained models and new tasks demonstrate the effectiveness of our method.

Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness and Efficiency

TL;DR

The paper addresses fairness during fine-tuning of large pre-trained models, showing that bias can emerge when adapting to new tasks even from fair baselines. It introduces weight importance neutralization based on Fisher information across demographic groups and couples it with a weighted low-rank SVD to compress the linear head, enabling fairer and faster adaptation. Experiments on Adult, CelebA, and LFW+a demonstrate improvements in both predictive accuracy and bias metrics such as Demographic Parity and Equalized Odds, with notable efficiency benefits from the low-rank approach. The proposed method remains effective even when the pre-trained model is fair, offering a practical strategy for robust fairness in downstream tasks without requiring explicit reformulation of fairness criteria.

Abstract

Fine-tuning pre-trained models is a widely employed technique in numerous real-world applications. However, fine-tuning these models on new tasks can lead to unfair outcomes. This is due to the absence of generalization guarantees for fairness properties, regardless of whether the original pre-trained model was developed with fairness considerations. To tackle this issue, we introduce an efficient and robust fine-tuning framework specifically designed to mitigate biases in new tasks. Our empirical analysis shows that the parameters in the pre-trained model that affect predictions for different demographic groups are different, so based on this observation, we employ a transfer learning strategy that neutralizes the importance of these influential weights, determined using Fisher information across demographic groups. Additionally, we integrate this weight importance neutralization strategy with a matrix factorization technique, which provides a low-rank approximation of the weight matrix using fewer parameters, reducing the computational demands. Experiments on multiple pre-trained models and new tasks demonstrate the effectiveness of our method.
Paper Structure (16 sections, 4 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 4 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Principal component analysis on the representations from the linear layer after fine-tuning the pre-trained models. Blue points represent the "Female" group, while orange points represent the "Male" group.
  • Figure 2: The weight importance neutralization fine-tuning. Left: the original fine-tuning method, Right: our proposed fine-tuning method.
  • Figure 3: The diagonal entries of the Fisher information matrix for all parameters of the final linear layer of the model. Left: the model is trained on $\mathcal{D}_{S=\text{"Female"}}$, Right: the model is trained on $\mathcal{D}_{S=\text{"Male"}}$.
  • Figure 4: The comparison among TL, $f$+SVD and OURS across three real-world datasets w.r.t. the test errors (F1 score) and fairness violations ($\Delta_{\text{DP}}$ and $\Delta_{\text{EO}}$).
  • Figure 5: Ablation study: The impact of varying the contribution of different demographic groups in neutralized weight importance on prediction errors and fairness violations using the Adult dataset.