Table of Contents
Fetching ...

ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression

Kai Yao, Zhaorui Tan, Tiandi Ye, Lichun Li, Yuan Zhao, Wenyan Liu, Wei Wang, Jianke Zhu

TL;DR

ScaleOT addresses the privacy-utility challenges of offsite-tuning large language models by avoiding uniform LayerDrop and costly distillation. It introduces an importance-aware Dynamic LayerReplace that uses reinforcement learning to identify which layers to replace with lightweight harmonizers, and a Selective Rank Compression that applies rank-$r$ approximations (via SVD) to compress the emulator, focusing on MHSA layers to enhance privacy with minimal utility loss. The emulator is created by the triplet $(N_a,\alpha,\beta)$, balancing the number of adapted layers, the fraction replaced by harmonizers, and the rank reduction, enabling privacy-utility-scalable emulators. Empirical results show ScaleOT can achieve nearly lossless plug-in performance compared to full fine-tuning while providing stronger model privacy across multiple model scales and tasks, illustrating its practical impact for secure, scalable offsite-tuning.

Abstract

Offsite-tuning is a privacy-preserving method for tuning large language models (LLMs) by sharing a lossy compressed emulator from the LLM owners with data owners for downstream task tuning. This approach protects the privacy of both the model and data owners. However, current offsite tuning methods often suffer from adaptation degradation, high computational costs, and limited protection strength due to uniformly dropping LLM layers or relying on expensive knowledge distillation. To address these issues, we propose ScaleOT, a novel privacy-utility-scalable offsite-tuning framework that effectively balances privacy and utility. ScaleOT introduces a novel layerwise lossy compression algorithm that uses reinforcement learning to obtain the importance of each layer. It employs lightweight networks, termed harmonizers, to replace the raw LLM layers. By combining important original LLM layers and harmonizers in different ratios, ScaleOT generates emulators tailored for optimal performance with various model scales for enhanced privacy protection. Additionally, we present a rank reduction method to further compress the original LLM layers, significantly enhancing privacy with negligible impact on utility. Comprehensive experiments show that ScaleOT can achieve nearly lossless offsite tuning performance compared with full fine-tuning while obtaining better model privacy.

ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression

TL;DR

ScaleOT addresses the privacy-utility challenges of offsite-tuning large language models by avoiding uniform LayerDrop and costly distillation. It introduces an importance-aware Dynamic LayerReplace that uses reinforcement learning to identify which layers to replace with lightweight harmonizers, and a Selective Rank Compression that applies rank- approximations (via SVD) to compress the emulator, focusing on MHSA layers to enhance privacy with minimal utility loss. The emulator is created by the triplet , balancing the number of adapted layers, the fraction replaced by harmonizers, and the rank reduction, enabling privacy-utility-scalable emulators. Empirical results show ScaleOT can achieve nearly lossless plug-in performance compared to full fine-tuning while providing stronger model privacy across multiple model scales and tasks, illustrating its practical impact for secure, scalable offsite-tuning.

Abstract

Offsite-tuning is a privacy-preserving method for tuning large language models (LLMs) by sharing a lossy compressed emulator from the LLM owners with data owners for downstream task tuning. This approach protects the privacy of both the model and data owners. However, current offsite tuning methods often suffer from adaptation degradation, high computational costs, and limited protection strength due to uniformly dropping LLM layers or relying on expensive knowledge distillation. To address these issues, we propose ScaleOT, a novel privacy-utility-scalable offsite-tuning framework that effectively balances privacy and utility. ScaleOT introduces a novel layerwise lossy compression algorithm that uses reinforcement learning to obtain the importance of each layer. It employs lightweight networks, termed harmonizers, to replace the raw LLM layers. By combining important original LLM layers and harmonizers in different ratios, ScaleOT generates emulators tailored for optimal performance with various model scales for enhanced privacy protection. Additionally, we present a rank reduction method to further compress the original LLM layers, significantly enhancing privacy with negligible impact on utility. Comprehensive experiments show that ScaleOT can achieve nearly lossless offsite tuning performance compared with full fine-tuning while obtaining better model privacy.

Paper Structure

This paper contains 18 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of layerwise compression strategies. (a) Uniform LayerDrop. (b) Our Dynamic LayerDrop drops layers with the estimated importance scores. (c) Our Dynamic LayerReplace with harmonizers. (d) Results of using different compression ratios. Our approach achieves better performance at the owner site while maintaining the performance discrepancy.
  • Figure 2: Comparison in various tuning methods. (a) Fine-tuning requires access to full model parameters and necessitates the co-location of data and model. (b) Vanilla OT OT allows downstream users to fine-tune adapters on a lossy compressed emulator and then return the adapter. However, knowledge distillation sanh2019distilberthinton2015distilling is very expensive, limiting its application. (c) The proposed ScaleOT introduces a layerwise importance-aware compression method Dynamic LayerReplace, providing privacy-utility-scalable emulators for downstream task tuning.
  • Figure 3: Rank Compression Study with varies $\beta$ on multi-head self-attention (MHSA) layer and a Feedforward Network (FFN) layer in Transformer Block on WikiText dataset.
  • Figure 4: Joint effect of $\alpha$ and $\beta$ in emulator generation.
  • Figure 5: Number of adapted layers in ScaleOT.
  • ...and 1 more figures