SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation
Yixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, Andy D. Pimentel, Anuj Pathania
TL;DR
SSH tackles the expensive fine-tuning of billion-parameter models by shifting updates to a real-valued spectral domain via the Discrete Hartley Transform. It selects the most informative frequency components using energy-based criteria and learns only a sparse set of Hartley coefficients, with updates recovered through the symmetric inverse transform. This yields substantial reductions in trainable parameters and GFLOPs while maintaining or surpassing performance on diverse NLP, NLG, and vision-language tasks. The approach outperforms existing PEFT methods across single- and multi-modal benchmarks, offering a scalable, numerically stable alternative for fine-tuning large foundation models.
Abstract
Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges when scaling to larger models or addressing more complex task adaptation. In this work, we introduce Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH), a novel approach that significantly reduces the number of trainable parameters while enhancing model performance. It selects the most informative spectral components across all layers, under the guidance of the initial weights after a discrete Hartley transformation (DHT). The lightweight inverse DHT then projects the spectrum back into the spatial domain for updates. Extensive experiments across both single-modality tasks such as language understanding and generation and multi-modality tasks such as video-text understanding demonstrate that SSH outperforms existing parameter-efficient fine-tuning (PEFT) methods while achieving substantial reductions in computational cost and memory requirements.
