Table of Contents
Fetching ...

DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

TL;DR

DP-Adam-AC tackles privacy-preserving fine-tuning of localizable language models by enhancing the DP-AdamW optimizer with adaptive gradient clipping, EMA-based evaluation, and a clip-driven learning-rate schedule. It introduces a variable-q Rényi DP accountant to tightly track privacy loss across micro-batches, enabling practical budgets for short training epochs. Empirical results on two localizable architectures (Qwen2.5-0.5B and BitNet-b1.58-2B) with synthetic data demonstrate favorable loss reduction under privacy constraints, with SLMs offering a better privacy-utility balance than BitNet due to their reduced parameter count and full-precision weights. The work highlights the feasibility of secure, localizable LLM deployment in security-sensitive environments and provides open-source tooling for broader adoption.

Abstract

Large language models (LLMs) such as ChatGPT have evolved into powerful and ubiquitous tools. Fine-tuning on small datasets allows LLMs to acquire specialized skills for specific tasks efficiently. Although LLMs provide great utility in both general and task-specific use cases, they are limited by two security-related concerns. First, traditional LLM hardware requirements make them infeasible to run locally on consumer-grade devices. A remote network connection with the LLM provider's server is usually required, making the system vulnerable to network attacks. Second, fine-tuning an LLM for a sensitive task may involve sensitive data. Non-private fine-tuning algorithms produce models vulnerable to training data reproduction attacks. Our work addresses these security concerns by enhancing differentially private optimization algorithms and applying them to fine-tune localizable language models. We introduce adaptable gradient clipping along with other engineering enhancements to the standard DP-Adam optimizer to create DP-Adam-AC. We use our optimizer to fine-tune examples of two localizable LLM designs, small language model (Qwen2.5-0.5B) and 1.58 bit quantization (Bitnet-b1.58-2B). We demonstrate promising improvements in loss through experimentation with two synthetic datasets.

DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

TL;DR

DP-Adam-AC tackles privacy-preserving fine-tuning of localizable language models by enhancing the DP-AdamW optimizer with adaptive gradient clipping, EMA-based evaluation, and a clip-driven learning-rate schedule. It introduces a variable-q Rényi DP accountant to tightly track privacy loss across micro-batches, enabling practical budgets for short training epochs. Empirical results on two localizable architectures (Qwen2.5-0.5B and BitNet-b1.58-2B) with synthetic data demonstrate favorable loss reduction under privacy constraints, with SLMs offering a better privacy-utility balance than BitNet due to their reduced parameter count and full-precision weights. The work highlights the feasibility of secure, localizable LLM deployment in security-sensitive environments and provides open-source tooling for broader adoption.

Abstract

Large language models (LLMs) such as ChatGPT have evolved into powerful and ubiquitous tools. Fine-tuning on small datasets allows LLMs to acquire specialized skills for specific tasks efficiently. Although LLMs provide great utility in both general and task-specific use cases, they are limited by two security-related concerns. First, traditional LLM hardware requirements make them infeasible to run locally on consumer-grade devices. A remote network connection with the LLM provider's server is usually required, making the system vulnerable to network attacks. Second, fine-tuning an LLM for a sensitive task may involve sensitive data. Non-private fine-tuning algorithms produce models vulnerable to training data reproduction attacks. Our work addresses these security concerns by enhancing differentially private optimization algorithms and applying them to fine-tune localizable language models. We introduce adaptable gradient clipping along with other engineering enhancements to the standard DP-Adam optimizer to create DP-Adam-AC. We use our optimizer to fine-tune examples of two localizable LLM designs, small language model (Qwen2.5-0.5B) and 1.58 bit quantization (Bitnet-b1.58-2B). We demonstrate promising improvements in loss through experimentation with two synthetic datasets.

Paper Structure

This paper contains 48 sections, 24 equations, 2 figures, 2 tables, 6 algorithms.

Figures (2)

  • Figure 1: Loss vs. training steps across different noise levels of DP-Adam-AC fine-tuning on Qwen2.5-0.5B.
  • Figure 2: Loss vs. training steps across different noise levels of DP-Adam-AC fine-tuning on BitNet-b1.58-2B.

Theorems & Definitions (2)

  • Definition 1: $(\varepsilon,\delta)$-Differential Privacy
  • Definition 2: $(\alpha,\varepsilon)$-RDP