Table of Contents
Fetching ...

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Kerem Zaman, Leshem Choshen, Shashank Srivastava

TL;DR

The inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge is studied, highlighting that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten.

Abstract

Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

TL;DR

The inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge is studied, highlighting that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten.

Abstract

Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.
Paper Structure (49 sections, 19 equations, 12 figures, 7 tables)

This paper contains 49 sections, 19 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Schematic showing our claims on a biased mask-filling scenario. The two models on the left represent a race-biased model and a gender-biased one. The colored shapes inside represent learned knowledge related to different skills, where some skills are shared across models (the triangle and the circle) and others are not (the square and the star) . The fused model to the right illustrates the preservation of shared knowledge and the corruption of unshared knowledge after model fusion.
  • Figure 2: The change of accuracies on synthetic (shortcut-) and original (orig-) validation sets during interpolation between model pairs, each having different shortcuts. (a) Interpolation between the model with ST shortcut and model with random weights (b) Interpolation between the models with OP and TiC shortcuts (c) Interpolation between the models with OP and ST shortcuts (d) Interpolation between the models with TiC and ST shortcuts.
  • Figure 3: TiC & OP$\to$TiC & OR. Shared shortcuts are kept during fusing. The change of accuracies on synthetic and original validation sets during interpolation between two models. Both learned the TiC shortcut but exactly one learned OP or OR.
  • Figure 4: A fused model keeps performance and forgets shortcuts. Accuracy of models that learned shortcuts with their fused model and the full model on all corresponding shortcut synthetic validation sets and the original task's validation sets. The results on original validation sets are average of performance of each model on their corresponding sets. The shortcut accuracies around the chance level show that the shorcuts are substantially forgotten.
  • Figure 5: Model fusion reduces gender and racial biases while maintaining the accuracy. The changes in (a) DP (b) TPR-GAP and (c) accuracy scores during the interpolation from gender-biased model to age-biased model.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 1