Table of Contents
Fetching ...

Reducing Bias in Pre-trained Models by Tuning while Penalizing Change

Niklas Penzel, Gideon Stein, Joachim Denzler

TL;DR

This work tackles post-hoc debiasing of pre-trained image classifiers by freezing a backbone and learning a zero-initialized change network that is added to the forward pass, with a loss $\mathcal{L}_{mc} = \mathcal{L}(f_{\theta + \theta'}(x), y) + \lambda \|\theta'\|$ to penalize parameter change. By applying either $\ell_1$, $\ell_2$, or a combination of both norms for $\|\theta'\|$, and employing an early stopping criterion based on correctly predicting a tuning batch with an additional minimum step delay $\epsilon$, the method achieves bias mitigation with very few tuning examples. Across four bias/domain-shift datasets (ISIC melanoma, CelebA hair color, Waterbirds, Camelyon17), the approach often yields improved unbiased-test performance for small tuning sets, while standard fine-tuning with early stopping can match or exceed gains for larger tuning sets. The results demonstrate a practical, data-efficient debiasing strategy that minimizes changes to the pre-trained parameters and can be integrated with existing baselines to reduce overfitting.

Abstract

Deep models trained on large amounts of data often incorporate implicit biases present during training time. If later such a bias is discovered during inference or deployment, it is often necessary to acquire new data and retrain the model. This behavior is especially problematic in critical areas such as autonomous driving or medical decision-making. In these scenarios, new data is often expensive and hard to come by. In this work, we present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias. We achieve this by tuning a zero-initialized copy of a frozen pre-trained network. Our method needs very few, in extreme cases only a single, examples that contradict the bias to increase performance. Additionally, we propose an early stopping criterion to modify baselines and reduce overfitting. We evaluate our approach on a well-known bias in skin lesion classification and three other datasets from the domain shift literature. We find that our approach works especially well with very few images. Simple fine-tuning combined with our early stopping also leads to performance benefits for a larger number of tuning samples.

Reducing Bias in Pre-trained Models by Tuning while Penalizing Change

TL;DR

This work tackles post-hoc debiasing of pre-trained image classifiers by freezing a backbone and learning a zero-initialized change network that is added to the forward pass, with a loss to penalize parameter change. By applying either , , or a combination of both norms for , and employing an early stopping criterion based on correctly predicting a tuning batch with an additional minimum step delay , the method achieves bias mitigation with very few tuning examples. Across four bias/domain-shift datasets (ISIC melanoma, CelebA hair color, Waterbirds, Camelyon17), the approach often yields improved unbiased-test performance for small tuning sets, while standard fine-tuning with early stopping can match or exceed gains for larger tuning sets. The results demonstrate a practical, data-efficient debiasing strategy that minimizes changes to the pre-trained parameters and can be integrated with existing baselines to reduce overfitting.

Abstract

Deep models trained on large amounts of data often incorporate implicit biases present during training time. If later such a bias is discovered during inference or deployment, it is often necessary to acquire new data and retrain the model. This behavior is especially problematic in critical areas such as autonomous driving or medical decision-making. In these scenarios, new data is often expensive and hard to come by. In this work, we present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias. We achieve this by tuning a zero-initialized copy of a frozen pre-trained network. Our method needs very few, in extreme cases only a single, examples that contradict the bias to increase performance. Additionally, we propose an early stopping criterion to modify baselines and reduce overfitting. We evaluate our approach on a well-known bias in skin lesion classification and three other datasets from the domain shift literature. We find that our approach works especially well with very few images. Simple fine-tuning combined with our early stopping also leads to performance benefits for a larger number of tuning samples.
Paper Structure (25 sections, 5 equations, 13 figures)

This paper contains 25 sections, 5 equations, 13 figures.

Figures (13)

  • Figure 1: Architecture of the proposed method. We tune a zero-initialized change network (light grey) that is added to a frozen pre-trained model (black).
  • Figure 2: The difference in accuracy after debiasing using our approach versus three baseline methods combined with our stopping scheme on the melanoma classification task.
  • Figure 3: Difference of the Euclidean norm of the model parameter vectors before and after the debiasing. Note that side-tuning zhang2020side and MAS aljundi2018memory lead to larger changes and overshadow the difference between our approach and fine-tuning. Hence, they are omitted here.
  • Figure 4: The difference in accuracy after debiasing using our approach versus three baseline methods combined with our early stopping scheme on the celebA liu2015faceattributes classification task.
  • Figure 5: The differences in accuracy and balanced accuracy after debiasing using our approach versus three baseline methods together with our early stopping scheme on the waterbirds dataset sagawa2019distributionally.
  • ...and 8 more figures