Table of Contents
Fetching ...

SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation

Yixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania

TL;DR

SSH tackles the expensive fine-tuning of billion-parameter models by shifting updates to a real-valued spectral domain via the Discrete Hartley Transform. It selects the most informative frequency components using energy-based criteria and learns only a sparse set of Hartley coefficients, with updates recovered through the symmetric inverse transform. This yields substantial reductions in trainable parameters and GFLOPs while maintaining or surpassing performance on diverse NLP, NLG, and vision-language tasks. The approach outperforms existing PEFT methods across single- and multi-modal benchmarks, offering a scalable, numerically stable alternative for fine-tuning large foundation models.

Abstract

Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges when scaling to larger models or addressing more complex task adaptation. In this work, we introduce Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH), a novel approach that significantly reduces the number of trainable parameters while enhancing model performance. It selects the most informative spectral components across all layers, under the guidance of the initial weights after a discrete Hartley transformation (DHT). The lightweight inverse DHT then projects the spectrum back into the spatial domain for updates. Extensive experiments across both single-modality tasks such as language understanding and generation and multi-modality tasks such as video-text understanding demonstrate that SSH outperforms existing parameter-efficient fine-tuning (PEFT) methods while achieving substantial reductions in computational cost and memory requirements.

SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation

TL;DR

SSH tackles the expensive fine-tuning of billion-parameter models by shifting updates to a real-valued spectral domain via the Discrete Hartley Transform. It selects the most informative frequency components using energy-based criteria and learns only a sparse set of Hartley coefficients, with updates recovered through the symmetric inverse transform. This yields substantial reductions in trainable parameters and GFLOPs while maintaining or surpassing performance on diverse NLP, NLG, and vision-language tasks. The approach outperforms existing PEFT methods across single- and multi-modal benchmarks, offering a scalable, numerically stable alternative for fine-tuning large foundation models.

Abstract

Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges when scaling to larger models or addressing more complex task adaptation. In this work, we introduce Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH), a novel approach that significantly reduces the number of trainable parameters while enhancing model performance. It selects the most informative spectral components across all layers, under the guidance of the initial weights after a discrete Hartley transformation (DHT). The lightweight inverse DHT then projects the spectrum back into the spatial domain for updates. Extensive experiments across both single-modality tasks such as language understanding and generation and multi-modality tasks such as video-text understanding demonstrate that SSH outperforms existing parameter-efficient fine-tuning (PEFT) methods while achieving substantial reductions in computational cost and memory requirements.

Paper Structure

This paper contains 27 sections, 7 equations, 4 figures, 12 tables, 1 algorithm.

Figures (4)

  • Figure 1: Performance and computation comparison of fine-tuning methods in NLP and CV Tasks. (a) For NLP on LLaMA3.1-8B, SSH achieves 7.93 GPT-4 score, closely matching full fine-tuning's 7.95 score, while using less than 0.1% of the parameters. (b) In CV tasks, SSH achieves 77.4% accuracy, matching the performance of full fine-tuning with significantly fewer parameters. (c) & (d) SSH reduces up to 55% of GFLOPs compared to FourierFT in both NLP and CV tasks, providing significant computation efficiency gains.
  • Figure 2: Overview of Sparse Spectrum Adaptation via Discrete Hartley Transform (SSH). First, the discrete Hartley transform (DHT) is applied to the pretrained weights to extract and retain the most important frequency components. Then, a selective process identifies specific spectral coefficients to be learned as trainable parameters, which are organized into a spectral matrix. Finally, the modified spectral matrix is transformed back to the spatial domain through the symmetric application of the inverse discrete Hartley transform (iDHT), ensuring accurate reconstruction and efficient updates to the model's weights.
  • Figure 3: Visual representation of the RoBERTa attention mechanism's key and value matrices before and after discrete Hartley transform (DHT). (a)(b) show the original weight distributions of the key and value matrices, respectively. (d)(e) depict the transformed DHT values, demonstrating effective spectral compression. Heatmaps (c)(f) illustrate the output weights before and after DHT, highlighting the achieved sparsity and efficient representation.
  • Figure 4: Ablation study of SSH on GLUE tasks illustrating the effect of varying energy ratios ($\delta$) on performance with RoBERTa-base (n=750). Performance is normalized to $\delta = 0.5$, showing optimal balance and diversity in spectral representation at $\delta = 0.7$.