HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation

Geyuan Zhang; Xiaofei Zhou; Chuheng Chen

HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation

Geyuan Zhang, Xiaofei Zhou, Chuheng Chen

TL;DR

This work tackles the high computational cost of fine-tuning large pre-trained language models by introducing a direct Updated Transformation (UT) paradigm, which preserves a strong correlation between original and updated weights. Building on UT, the Hadamard Updated Transformation (HUT) uses a Hadamard product with two low-rank matrices to update weight matrices in transformers, achieving a richer but more efficient parameter update. The authors demonstrate, through experiments on RoBERTa-large (GLUE) and GPT-2 (E2E NLG), that HUT attains competitive or state-of-the-art performance while significantly reducing training FLOPs and maintaining zero inferences latency. This approach offers a principled, computation-efficient alternative to conventional PEFT methods and highlights the practical impact of maintaining original–updated parameter correlations during fine-tuning.

Abstract

Fine-tuning pre-trained language models for downstream tasks has achieved impressive results in NLP. However, fine-tuning all parameters becomes impractical due to the rapidly increasing size of model parameters. To address this, Parameter Efficient Fine-Tuning (PEFT) methods update only a subset of parameters. Most PEFT methods, such as LoRA, use incremental updates, which involve adding learned weight matrix increments to the original parameters. Although effective, these methods face limitations in capturing complex parameter dynamics and do not maintain a strong correlation between the original and updated parameters. To overcome these challenges, we propose the direct Updated Transformation (UT) paradigm, which constructs a transformation directly from the original to the updated parameters. This approach ensures that the correlation between the original and updated parameters is preserved, leveraging the semantic features learned during pre-training. Building on this paradigm, we present the Hadamard Updated Transformation (HUT) method. HUT efficiently updates the original weight matrix using the Hadamard transformation with two low-rank matrices, offering a more expressive and flexible update mechanism. This allows HUT to capture richer parameter features through functional transformations, reducing computational complexity while maintaining or improving model quality. Theoretical analysis and extensive experiments on RoBERTa and GPT-2 validate the effectiveness of HUT. Results show that HUT performs on par with or better than other PEFT methods in terms of model quality, while significantly reducing computational complexity.

HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation

TL;DR

Abstract

Paper Structure (29 sections, 11 equations, 4 figures, 7 tables)

This paper contains 29 sections, 11 equations, 4 figures, 7 tables.

Introduction
Related Work
Method
Direct Updated Transformation (UT) paradigm
Hadamard Updated Transformation (HUT)
Computation Complexity Analysis
Experiments
Experimental Settings
Natural Language Understanding
Models and Datasets.
Implementation Details.
Main Results.
Natural Language Generation
Models and Datasets.
Implementation Details.
...and 14 more sections

Figures (4)

Figure 1: Parameter updating procedure through Incremental Update and our Transformation Update. Most of existing PEFT methods learn a incremental update by adding $\Delta W$ to original weight matrix $W_0$, while we proposed direct update method that uses an update transformation to get $W_{new}$.
Figure 2: (a) Our proposed HUT can maintain a strong correlation between $W_0$ and $U'(W)$ so that the learned $U'(W)$ can leverage the semantic features learned during training. (b) The design of HUT Module.
Figure 3: Average scores in GLUE benchmark based on RoBERTa with different PEFT methods. The x-axis is the number of GFLOPs, which indicates the computation complexity, and the y-axis is the average scores.
Figure 4: Visualization of some results. The shades of red indicate the degree of emphasis that the fine-tuned model places on different words.

HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation

TL;DR

Abstract

HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)