Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

Kai Huang; Hanyun Yin; Heng Huang; Wei Gao

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

Kai Huang, Hanyun Yin, Heng Huang, Wei Gao

TL;DR

The paper tackles the environmental footprint of fine-tuning large language models by proposing GreenTrainer, an adaptive backpropagation method that selectively involves tensors in training to reduce FLOPs. It introduces a tensor-level backpropagation FLOPs model with components $t_{dy}$ and $t_{dw}$, and uses a first-order importance criterion $I_k= -\sum_i \Delta w_i^{(k)} \frac{\partial L}{\partial w_i^{(k)}}$ to quantify tensor contributions. A dynamic programming-based tensor selector ensures the FLOPs budget, $T_{fp} + \bm{m} \cdot \bm{t}_{dw} + \sigma(\bm{m}) \bm{t}_{dy} \le \rho T_{full}$, is met while maximizing loss reduction. Across OPT, BLOOMZ, and FLAN-T5 models on SciTLDR and DialogSum, GreenTrainer achieves up to 64% FLOPs savings with minimal accuracy loss, and can even improve accuracy relative to full fine-tuning or LoRA at the same FLOPs. The approach provides flexible tradeoffs for Green AI by adjusting the FLOPs-reduction objective and scales across model sizes, offering practical impact for energy-aware fine-tuning of LLMs.

Abstract

Fine-tuning is the most effective way of adapting pre-trained large language models (LLMs) to downstream applications. With the fast growth of LLM-enabled AI applications and democratization of open-souced LLMs, fine-tuning has become possible for non-expert individuals, but intensively performed LLM fine-tuning worldwide could result in significantly high energy consumption and carbon footprint, which may bring large environmental impact. Mitigating such environmental impact towards Green AI directly correlates to reducing the FLOPs of fine-tuning, but existing techniques on efficient LLM fine-tuning can only achieve limited reduction of such FLOPs, due to their ignorance of the backpropagation cost in fine-tuning. To address this limitation, in this paper we present GreenTrainer, a new LLM fine-tuning technique that adaptively evaluates different tensors' backpropagation costs and contributions to the fine-tuned model accuracy, to minimize the fine-tuning cost by selecting the most appropriate set of tensors in training. Such selection in GreenTrainer is made based on a given objective of FLOPs reduction, which can flexibly adapt to the carbon footprint in energy supply and the need in Green AI. Experiment results over multiple open-sourced LLM models and abstractive summarization datasets show that, compared to fine-tuning the whole LLM model, GreenTrainer can save up to 64% FLOPs in fine-tuning without any noticeable model accuracy loss. Compared to the existing fine-tuning techniques such as LoRa, GreenTrainer can achieve up to 4% improvement on model accuracy with on-par FLOPs reduction.

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

TL;DR

and

, and uses a first-order importance criterion

to quantify tensor contributions. A dynamic programming-based tensor selector ensures the FLOPs budget,

, is met while maximizing loss reduction. Across OPT, BLOOMZ, and FLAN-T5 models on SciTLDR and DialogSum, GreenTrainer achieves up to 64% FLOPs savings with minimal accuracy loss, and can even improve accuracy relative to full fine-tuning or LoRA at the same FLOPs. The approach provides flexible tradeoffs for Green AI by adjusting the FLOPs-reduction objective and scales across model sizes, offering practical impact for energy-aware fine-tuning of LLMs.

Abstract

Paper Structure (22 sections, 9 equations, 5 figures, 11 tables)

This paper contains 22 sections, 9 equations, 5 figures, 11 tables.

Introduction
Background & Motivation
Transformer Architectures for Text Generation
The Need for Adaptive Backpropagation
FLOPs Model of Backpropagation
Method
Tensor FLOPs Profiling
Tensor Importance Evaluation
Tensor Selection
Experiments
Training Cost & Accuracy
The Impact of FLOPs Reduction Objective
Efficacy of Tensor Importance Metrics
Impact of LLM Size
Conclusion
...and 7 more sections

Figures (5)

Figure 1: GreenTrainer adaptively selects the most appropriate portion of LLM model for fine-tuning
Figure 2: Backpropagation of a 4-layer dense NN
Figure 3: An sample workflow of tensor FLOPs profiling
Figure 4: Solving the tensor selection problem using DP
Figure 5: An example of tensor FLOPs profiling in the OPT-2.7B model

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

TL;DR

Abstract

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)