Table of Contents
Fetching ...

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

Ivoline C. Ngong, Joseph P. Near, Niloofar Mireshghallah

TL;DR

DPRefine is introduced, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs, which significantly outperforms vanilla DPSGD.

Abstract

Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine's generations in 78.4% of cases across all datasets. Our analysis reveals that DPRefine reduces linguistic errors in generated text by 84.0%, mitigating grammar and spelling errors, commonly associated with DPSGD. It also reduces inconsistencies of non-private models, such as hallucinated details and misattributed quotes. We find that small models like GPT-2 can be effective for initialization and distillation, highlighting their potential in enabling scalable and efficient deployment of privacy-preserving language.

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

TL;DR

DPRefine is introduced, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs, which significantly outperforms vanilla DPSGD.

Abstract

Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine's generations in 78.4% of cases across all datasets. Our analysis reveals that DPRefine reduces linguistic errors in generated text by 84.0%, mitigating grammar and spelling errors, commonly associated with DPSGD. It also reduces inconsistencies of non-private models, such as hallucinated details and misattributed quotes. We find that small models like GPT-2 can be effective for initialization and distillation, highlighting their potential in enabling scalable and efficient deployment of privacy-preserving language.

Paper Structure

This paper contains 49 sections, 1 equation, 3 figures, 12 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of DPRefine's three-phase approach: Phase 1: Data Synthesis and Model Initialization generates synthetic training pairs using GPT-2, applies quality filtering, and performs initial fine-tuning on a pre-trained T5 (encoder-decoder model) to create an initialized paraphraser/summarize, all without accessing private data. Phase 2: Differentially Private Fine-tuning applies DP-SGD on private labeled data to create a privacy-preserving domain paraphraser/summarizer. Phase 3: Self-Distillation Refinement uses the DP model to generate new training pairs, applies filtering, and performs final fine-tuning to produce a refined domain paraphraser/summarizer.
  • Figure 2: Comparison of DPRefine and DPSGD across multiple metrics (Preference, Coherence, Consistency, Fact Omission, Fluency, and Relevance) for the XSum, PubMed, and MRPC datasets. Error bars represent standard deviations. DPRefine demonstrates consistently stronger performance showing its ability to generate more contextually aligned and factually accurate outputs compared to DPSGD.
  • Figure 3: Manual analysis of error types for three models: No DP, DPSGD, and DPRefine across XSum, MRPC, and PubMed datasets. The first row shows language errors, and the second row shows inconsistency errors. DPRefine consistently reduces both language errors and inconsistencies compared to DPSGD, leading to more accurate and fluent outputs.