Table of Contents
Fetching ...

Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

TL;DR

The paper targets privacy-preserving ML training via differential privacy, focusing on correlated noise mechanisms that improve accuracy but introduce substantial memory and compute overheads for large models. It analyzes the overheads and proposes Cocoon, a hardware-software framework that pre-computes and coalesces noise for embedding tables (Cocoon-Emb) and adds a near-memory processing (Cocoon-NMP) engine to handle large noise histories with minimal data movement. Empirical results on CPU, GPU, and FPGA-based NMP prototypes show 2.33–10.82x speedups for embedding-heavy workloads and 1.55–3.06x for large models, with memory footprints mitigated by hot/cold splitting and coalescing. These techniques enable scalable DP training for modern DLRM and large-scale models, offering practical improvements for real-world privacy-preserving ML deployments.

Abstract

Machine learning (ML) models memorize and leak training data, causing serious privacy issues to data owners. Training algorithms with differential privacy (DP), such as DP-SGD, have been gaining attention as a solution. However, DP-SGD adds a noise at each training iteration, which degrades the accuracy of the trained model. To improve accuracy, a new family of approaches adds carefully designed correlated noises, so that noises cancel out each other across iterations. We performed an extensive characterization study of these new mechanisms, for the first time to the best of our knowledge, and show they incur non-negligible overheads when the model is large or uses large embedding tables. Motivated by the analysis, we propose Cocoon, a hardware-software co-designed framework for efficient training with correlated noises. Cocoon accelerates models with embedding tables through pre-computing and storing correlated noises in a coalesced format (Cocoon-Emb), and supports large models through a custom near-memory processing device (Cocoon-NMP). On a real system with an FPGA-based NMP device prototype, Cocoon improves the performance by 2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).

Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

TL;DR

The paper targets privacy-preserving ML training via differential privacy, focusing on correlated noise mechanisms that improve accuracy but introduce substantial memory and compute overheads for large models. It analyzes the overheads and proposes Cocoon, a hardware-software framework that pre-computes and coalesces noise for embedding tables (Cocoon-Emb) and adds a near-memory processing (Cocoon-NMP) engine to handle large noise histories with minimal data movement. Empirical results on CPU, GPU, and FPGA-based NMP prototypes show 2.33–10.82x speedups for embedding-heavy workloads and 1.55–3.06x for large models, with memory footprints mitigated by hot/cold splitting and coalescing. These techniques enable scalable DP training for modern DLRM and large-scale models, offering practical improvements for real-world privacy-preserving ML deployments.

Abstract

Machine learning (ML) models memorize and leak training data, causing serious privacy issues to data owners. Training algorithms with differential privacy (DP), such as DP-SGD, have been gaining attention as a solution. However, DP-SGD adds a noise at each training iteration, which degrades the accuracy of the trained model. To improve accuracy, a new family of approaches adds carefully designed correlated noises, so that noises cancel out each other across iterations. We performed an extensive characterization study of these new mechanisms, for the first time to the best of our knowledge, and show they incur non-negligible overheads when the model is large or uses large embedding tables. Motivated by the analysis, we propose Cocoon, a hardware-software co-designed framework for efficient training with correlated noises. Cocoon accelerates models with embedding tables through pre-computing and storing correlated noises in a coalesced format (Cocoon-Emb), and supports large models through a custom near-memory processing device (Cocoon-NMP). On a real system with an FPGA-based NMP device prototype, Cocoon improves the performance by 2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).

Paper Structure

This paper contains 41 sections, 1 equation, 20 figures.

Figures (20)

  • Figure 1: DP-SGD vs. correlated noise mechanism.
  • Figure 2: Noise history size of various ML models and $\hat{b}$.
  • Figure 3: Training time of OPT zhang_2022_opt on 1--4 A5000 GPUs.
  • Figure 4: Training time breakdown for DLRM.
  • Figure 5: Training time breakdown when the noise history entirely fits into main memory.
  • ...and 15 more figures