Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li
TL;DR
This work introduces FourierFT, a parameter-efficient fine-tuning method that represents weight updates in the Fourier domain by learning a small set of spectral coefficients on a shared spectral-entry matrix. By recovering the spatial weight changes via an inverse discrete Fourier transform, FourierFT achieves substantial parameter savings—often orders of magnitude smaller than LoRA—while maintaining or improving performance across NLP and CV tasks. The approach demonstrates strong results on GLUE, E2E, instruction tuning for LLaMA-family models, and ViT image classification, with explicit examples like instruction tuning on LLaMA2-7B using only $0.064\mathrm{M}$ parameters compared to LoRA's $33.5\mathrm{M}$. Overall, FourierFT offers a scalable, memory-efficient alternative for adapting large foundation models, enabling broader on-device customization and multi-task specialization with minimal storage overhead.
Abstract
Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $ΔW$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $ΔW$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}.
