Table of Contents
Fetching ...

TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

Xiangyu Chen, Jing Liu, Ye Wang, Matthew Brand, Pu, Wang, Toshiaki Koike-Akino

TL;DR

The paper tackles the inefficiency of sequentially fine-tuning and post-training compression for large foundation models. It introduces TuneComp, a unified pipeline that jointly fine-tunes, distills, compresses via a pruned low-rank structure, and prunes during distillation, with activation-aware initialization and a decaying regularization strategy. Empirical results on ViT with ImageNet1K to CIFAR100 transfer show that TuneComp outperforms baseline pipelines and existing joint methods across compression levels, producing a compact model without large intermediate representatives. This approach enables more efficient deployment of foundation models by tightly integrating training and compression into a single process with strong accuracy/performance guarantees.

Abstract

To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.

TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

TL;DR

The paper tackles the inefficiency of sequentially fine-tuning and post-training compression for large foundation models. It introduces TuneComp, a unified pipeline that jointly fine-tunes, distills, compresses via a pruned low-rank structure, and prunes during distillation, with activation-aware initialization and a decaying regularization strategy. Empirical results on ViT with ImageNet1K to CIFAR100 transfer show that TuneComp outperforms baseline pipelines and existing joint methods across compression levels, producing a compact model without large intermediate representatives. This approach enables more efficient deployment of foundation models by tightly integrating training and compression into a single process with strong accuracy/performance guarantees.

Abstract

To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.

Paper Structure

This paper contains 18 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The proposed joint fine-tuning and compression pipeline, where compression involves low-rank approximation as well as the pruning/sparsification of the low-rank structures.
  • Figure 2: Comparison between our proposed TuneComp, jointly fine-tuning and compression, with other compression strategies.
  • Figure 3: Performance of TuneComp under different pruning ratio $\rho \in \{0\%, 20\%, 40\%, 60\%, 80\%, 90\%, 95\%\}$.