Table of Contents
Fetching ...

Privacy-preserving Fine-tuning of Large Language Models through Flatness

Tiejin Chen, Longchao Da, Huixue Zhou, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei

TL;DR

This paper reveals that the flatness of DP-trained models' loss landscape plays an essential role in the trade-off between their privacy and generalization, and proposes a holistic framework to enforce appropriate weight flatness, which substantially improves model generalization with competitive privacy preservation.

Abstract

The privacy concerns associated with the use of Large Language Models (LLMs) have grown recently with the development of LLMs such as ChatGPT. Differential Privacy (DP) techniques are explored in existing work to mitigate their privacy risks at the cost of generalization degradation. Our paper reveals that the flatness of DP-trained models' loss landscape plays an essential role in the trade-off between their privacy and generalization. We further propose a holistic framework to enforce appropriate weight flatness, which substantially improves model generalization with competitive privacy preservation. It innovates from three coarse-to-grained levels, including perturbation-aware min-max optimization on model weights within a layer, flatness-guided sparse prefix-tuning on weights across layers, and weight knowledge distillation between DP \& non-DP weights copies. Comprehensive experiments of both black-box and white-box scenarios are conducted to demonstrate the effectiveness of our proposal in enhancing generalization and maintaining DP characteristics. For instance, on text classification dataset QNLI, DP-Flat achieves similar performance with non-private full fine-tuning but with DP guarantee under privacy budget $ε=3$, and even better performance given higher privacy budgets. Codes are provided in the supplement.

Privacy-preserving Fine-tuning of Large Language Models through Flatness

TL;DR

This paper reveals that the flatness of DP-trained models' loss landscape plays an essential role in the trade-off between their privacy and generalization, and proposes a holistic framework to enforce appropriate weight flatness, which substantially improves model generalization with competitive privacy preservation.

Abstract

The privacy concerns associated with the use of Large Language Models (LLMs) have grown recently with the development of LLMs such as ChatGPT. Differential Privacy (DP) techniques are explored in existing work to mitigate their privacy risks at the cost of generalization degradation. Our paper reveals that the flatness of DP-trained models' loss landscape plays an essential role in the trade-off between their privacy and generalization. We further propose a holistic framework to enforce appropriate weight flatness, which substantially improves model generalization with competitive privacy preservation. It innovates from three coarse-to-grained levels, including perturbation-aware min-max optimization on model weights within a layer, flatness-guided sparse prefix-tuning on weights across layers, and weight knowledge distillation between DP \& non-DP weights copies. Comprehensive experiments of both black-box and white-box scenarios are conducted to demonstrate the effectiveness of our proposal in enhancing generalization and maintaining DP characteristics. For instance, on text classification dataset QNLI, DP-Flat achieves similar performance with non-private full fine-tuning but with DP guarantee under privacy budget , and even better performance given higher privacy budgets. Codes are provided in the supplement.
Paper Structure (24 sections, 9 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 9 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Left: Weight loss landscape for DP-trained LLMs and normal (non-private) training on SST-2. The DP-trained model has a sharper loss landscape. Right: The privacy-performance trade-off for DP-trained LLMs: Compared with normal trained models, the DP-trained model has lower privacy risks (better privacy) under Membership Inference Attack (MIA), while it shows lower classification accuracy (worse performance).
  • Figure 2: Our methods improve the flatness of weight loss landscape from three aspects: (1) Within-layer flattening, where a perturbation-aware min-max optimization is utilized to encourage the loss flatness within the weight space of each LLM layer. (2) Cross-layer flattening, where a sparse prefix-tuning algorithm guides layer selection with a flatness-ware indicator. (3) Cross-model flattening, where non-private prefixes are used to guide DP training through knowledge distillation regularization.
  • Figure 3: Sharpness for DP trained prefix tuning plus our proposed three weight flattening methods on SST-2. Our proposed model has a flatter loss landscape.
  • Figure 4: Comparison of MIA accuracy under both white-box and black-box settings across text classification datasets. The lower the accuracy, the lower the privacy risk. The results show that our proposed method will not affect the privacy protection for both white-box and black-box settings.
  • Figure 5: Influences of gradually removing different flatness methods on the classification performance w.r.t. accuracy under SST-2 dataset on Roberta-base. The higher, the better. Each aspect helps the final performance while maintaining the DP guarantee.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 3.1: $(\epsilon,\delta)$-Differential Privacy
  • Definition 3.2: Prefix Sharpness