Table of Contents
Fetching ...

Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace

Jia-Chen Zhang, Yu-Jie Xiong, Chun-Ming Xia, Dong-Hai Zhu, Xi-He Qiu

TL;DR

This paper addresses the high parameter cost of fine-tuning large language models for downstream tasks. It introduces Deconvolution Fine-Tuning (DCFT), which learns low-rank increments in a subspace and uses deconvolution (transposed convolution) to upscale and refine them, thereby bypassing the rank-one bottleneck inherent to LoRA. The method enforces orthogonal projections and optimizes efficiency through a low-rank factorization, equal kernel stride, and larger convolution kernels, demonstrating strong performance on GLUE and SQuAD with roughly 8× fewer trainable parameters than LoRA-based approaches. The results indicate DCFT is a practical, scalable PEFT alternative for large models, offering meaningful parameter savings without sacrificing accuracy across NLP tasks.

Abstract

Large language model (LLM) is considered a milestone towards achieving Artificial General Intelligence (AGI). With its advanced emergent capabilities, it adapt to a wide range of specific applications. Fine-tuning LLMs for various downstream tasks has become a new paradigm. Low-Rank Adaptation (LoRA) is well-known for its parameter efficiency. It can reduce the number of parameters needed to fine-tune LLMs by several orders of magnitude. However, LoRA-based approaches encounter a significant limitation due to the bottleneck imposed by rank one decomposition. As the parameters count in LLMs increase, even rank one decomposition might surpass the number of parameters truly necessary for handling more downstream tasks. In this paper, we propose a new method for Parameter-Efficient Fine-Tuning (PEFT) via deconvolution in subspace, dubbed as DCFT. We innovatively use deconvolution to complete details and enhance knowledge in subspace incremental matrices, and dynamically control parameters by adjusting the kernel size, unconstrained by rank-one decomposition. Extensive experiments are conducted to validate the effectiveness of DCFT. Results show that compared to LoRA, DCFT achieve an 8$\times$ reduction in parameters, and still achieves highly impressive performance. Our code is available here: https://github.com/Godz-z/DCFT.

Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace

TL;DR

This paper addresses the high parameter cost of fine-tuning large language models for downstream tasks. It introduces Deconvolution Fine-Tuning (DCFT), which learns low-rank increments in a subspace and uses deconvolution (transposed convolution) to upscale and refine them, thereby bypassing the rank-one bottleneck inherent to LoRA. The method enforces orthogonal projections and optimizes efficiency through a low-rank factorization, equal kernel stride, and larger convolution kernels, demonstrating strong performance on GLUE and SQuAD with roughly 8× fewer trainable parameters than LoRA-based approaches. The results indicate DCFT is a practical, scalable PEFT alternative for large models, offering meaningful parameter savings without sacrificing accuracy across NLP tasks.

Abstract

Large language model (LLM) is considered a milestone towards achieving Artificial General Intelligence (AGI). With its advanced emergent capabilities, it adapt to a wide range of specific applications. Fine-tuning LLMs for various downstream tasks has become a new paradigm. Low-Rank Adaptation (LoRA) is well-known for its parameter efficiency. It can reduce the number of parameters needed to fine-tune LLMs by several orders of magnitude. However, LoRA-based approaches encounter a significant limitation due to the bottleneck imposed by rank one decomposition. As the parameters count in LLMs increase, even rank one decomposition might surpass the number of parameters truly necessary for handling more downstream tasks. In this paper, we propose a new method for Parameter-Efficient Fine-Tuning (PEFT) via deconvolution in subspace, dubbed as DCFT. We innovatively use deconvolution to complete details and enhance knowledge in subspace incremental matrices, and dynamically control parameters by adjusting the kernel size, unconstrained by rank-one decomposition. Extensive experiments are conducted to validate the effectiveness of DCFT. Results show that compared to LoRA, DCFT achieve an 8 reduction in parameters, and still achieves highly impressive performance. Our code is available here: https://github.com/Godz-z/DCFT.

Paper Structure

This paper contains 27 sections, 14 equations, 3 figures, 10 tables, 1 algorithm.

Figures (3)

  • Figure 1: An illustration of the differences between LoRA and DCFT. The parameter calculation results represent the model's parameters when $r = 1$. $D$ represents the dimension of the pretrained weights, and $d$ represents the dimension of the convolution kernel.
  • Figure 2: Illustration of the total training time for DCFT and LoRA on four datasets: COLA, RTE, MRPC, and QNLI.
  • Figure 3: Accuracy and loss results of DCFT and LoRA on the SST-2 dataset.