Spectral-Aware Low-Rank Adaptation for Speaker Verification
Zhe Li, Man-wai Mak, Mert Pilanci, Hung-yi Lee, Helen Meng
TL;DR
The paper addresses the limitation of conventional PEFT methods like LoRA that do not exploit spectral structure in pre-trained weights. It introduces SpectralFT, a spectral-aware fine-tuning scheme that decomposes weight matrices via SVD into a principal subspace $W_p$ and a minor subspace $W_m$, freezes $W_p$, and applies LoRA-style adapters to the top spectral components of $W_p$ through $\Delta_U$ and $\Delta_V$. Experiments on VoxCeleb1 and CN-Celeb1 using HuBERT-Large or WavLM-Large as pre-trained models and ECAPA-TDNN as the speaker encoder show that SpectralFT outperforms Adapter, static prompt tuning, LoRA, and other baselines, especially when tuning $\mathbf{W}_q$ and $\mathbf{W}_k$ with a moderate rank $r$ and top components $k$. The findings indicate that focusing adaptation within the top spectral space preserves essential pre-trained knowledge while enabling task-specific refinement, yielding improved speaker verification performance with modest computational overhead. This spectral-guided PEFT approach offers a practical path to efficient, high-capacity fine-tuning for speech applications and potentially beyond.
Abstract
Previous research has shown that the principal singular vectors of a pre-trained model's weight matrices capture critical knowledge. In contrast, those associated with small singular values may contain noise or less reliable information. As a result, the LoRA-based parameter-efficient fine-tuning (PEFT) approach, which does not constrain the use of the spectral space, may not be effective for tasks that demand high representation capacity. In this study, we enhance existing PEFT techniques by incorporating the spectral information of pre-trained weight matrices into the fine-tuning process. We investigate spectral adaptation strategies with a particular focus on the additive adjustment of top singular vectors. This is accomplished by applying singular value decomposition (SVD) to the pre-trained weight matrices and restricting the fine-tuning within the top spectral space. Extensive speaker verification experiments on VoxCeleb1 and CN-Celeb1 demonstrate enhanced tuning performance with the proposed approach. Code is released at https://github.com/lizhepolyu/SpectralFT.
