Adaptive Principal Components Allocation with the $\ell_{2,g}$-regularized Gaussian Graphical Model for Efficient Fine-Tuning Large Models
Jingjing Zheng, Yankai Cao
TL;DR
This paper tackles the challenge of parameter-efficient fine-tuning for very large models by modeling inter-parameter interactions with a Gaussian Graphical Model. It introduces a novel $\ell_{2,g}$-regularized GGM and an SVD-based node construction to selectively train principal components per layer, capturing global dependencies that local low-rank methods miss. A BCD optimization framework solves the non-convex $\ell_{2,g}$ objective via a coupled $(\Omega,\Delta)$ formulation, enabling effective node selection through structural sparsity and important-node metrics. Empirical results on the GLUE benchmark with RoBERTa-Base demonstrate competitive performance with significantly fewer trainable parameters, and ablations show the value of including an important-nodes mechanism. Overall, the work advances PEFT by integrating global dependency modeling and non-convex sparsity to achieve both efficiency and effectiveness in fine-tuning.
Abstract
In this work, we propose a novel Parameter-Efficient Fine-Tuning (PEFT) approach based on Gaussian Graphical Models (GGMs), marking the first application of GGMs to PEFT tasks, to the best of our knowledge. The proposed method utilizes the $\ell_{2,g}$-norm to effectively select critical parameters and capture global dependencies. The resulting non-convex optimization problem is efficiently solved using a Block Coordinate Descent (BCD) algorithm. Experimental results on the GLUE benchmark [24] for fine-tuning RoBERTa-Base [18] demonstrate the effectiveness of the proposed approach, achieving competitive performance with significantly fewer trainable parameters. The code for this work is available at: https://github.com/jzheng20/Course projects.git.
