Table of Contents
Fetching ...

Offsite-Tuning: Transfer Learning without Full Model

Guangxuan Xiao, Ji Lin, Song Han

TL;DR

Offsite-Tuning tackles privacy and resource barriers in fine-tuning billion-parameter foundation models by splitting the model into a small trainable adapter and a lossy compressed emulator that the data owner can tune without accessing full weights. The updated adapter is plugged back into the original model, achieving competitive performance with full fine-tuning while preserving data and model ownership. The framework uses a sandwich adapter, layer-drop emulator compression, and optional distillation, and is compatible with existing parameter-efficient fine-tuning methods, enabling efficient adaptation for both language and vision models. The approach yields substantial efficiency gains and broadens practical deployment in privacy-sensitive or resource-constrained settings.

Abstract

Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning.

Offsite-Tuning: Transfer Learning without Full Model

TL;DR

Offsite-Tuning tackles privacy and resource barriers in fine-tuning billion-parameter foundation models by splitting the model into a small trainable adapter and a lossy compressed emulator that the data owner can tune without accessing full weights. The updated adapter is plugged back into the original model, achieving competitive performance with full fine-tuning while preserving data and model ownership. The framework uses a sandwich adapter, layer-drop emulator compression, and optional distillation, and is compatible with existing parameter-efficient fine-tuning methods, enabling efficient adaptation for both language and vision models. The approach yields substantial efficiency gains and broadens practical deployment in privacy-sensitive or resource-constrained settings.

Abstract

Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning.
Paper Structure (27 sections, 1 equation, 4 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 1 equation, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparing existing fine-tuning approaches (top and middle) and Offsite-Tuning (bottom). (a) Traditionally, users send labeled data to model owners for fine-tuning, raising privacy concerns and incurring high computational costs. (b) Model owner sending the full model to the data owner is not practical, which threatens the ownership of the proprietary model, and it's not affordable for users to fine-tune the huge foundation model due to resource constraints. (c) Offsite-tuning offers a privacy-preserving and efficient alternative to traditional fine-tuning methods that require access to full model weights.
  • Figure 2: Overview of Offsite-Tuning. Fine-tuning (left) requires access to the full model weights and needs both model and data to be in one location. In Offsite-tuning (right), the model owner sends an adapter and an emulator to the data owner, who fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned and plugged into the full model to create an adapted foundation model. As neither party needs to share full models or data and the emulator is compressed, offsite-tuning is both privacy-preserving and efficient.
  • Figure 3: Ablation study of the number and position of adapter layers. Fine-tuning both the top and bottom layers of the language model is significantly more effective than fine-tuning only the top or bottom layers, given the same number of trainable layers.
  • Figure 4: Ablation study of compression methods for creating the emulator. The layer-drop method is superior in two aspects: (1) it effectively maintains the plug-in performance while reducing the size of the emulator; (2) it creates a gap between the plug-in performance and the emulator performance, preserving the privacy of the model owner.