Table of Contents
Fetching ...

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

Linhai Zhang, Jialong Wu, Deyu Zhou, Guoqiang Xu

TL;DR

Addressing data-efficient fine-tuning of large language models, the paper reveals that naively combining LoRA with uncertainty-based active learning leads to an adverse uncertainty gap and calibration issues. It introduces STAR, which blends a dynamic uncertainty measure that fuses base-model and full-model uncertainty across AL iterations with a hybrid regularization approach and MC dropout to improve calibration. Empirical results across GSM8K, BoolQ, and OpenBookQA show STAR substantially outperforms baselines and mitigates the failure modes observed in prior AL+LoRA setups, with ablations confirming the contributions of dynamic uncertainty and calibration. This work advances practical data-efficient fine-tuning for reasoning tasks under tight annotation budgets and resource constraints, while noting limitations and avenues for future exploration with larger models and other PEFT-method pairings.

Abstract

Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-shot learning through prompting methods, supervised training is still necessary for complex reasoning tasks. Because of their extensive parameters and memory consumption, both Parameter-Efficient Fine-Tuning (PEFT) methods and Memory-Efficient Fine-Tuning methods have been proposed for LLMs. Nevertheless, the issue of large annotated data consumption, the aim of Data-Efficient Fine-Tuning, remains unexplored. One obvious way is to combine the PEFT method with active learning. However, the experimental results show that such a combination is not trivial and yields inferior results. Through probe experiments, such observation might be explained by two main reasons: uncertainty gap and poor model calibration. Therefore, in this paper, we propose a novel approach to effectively integrate uncertainty-based active learning and LoRA. Specifically, for the uncertainty gap, we introduce a dynamic uncertainty measurement that combines the uncertainty of the base model and the uncertainty of the full model during the iteration of active learning. For poor model calibration, we incorporate the regularization method during LoRA training to keep the model from being over-confident, and the Monte-Carlo dropout mechanism is employed to enhance the uncertainty estimation. Experimental results show that the proposed approach outperforms existing baseline models on three complex reasoning tasks.

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

TL;DR

Addressing data-efficient fine-tuning of large language models, the paper reveals that naively combining LoRA with uncertainty-based active learning leads to an adverse uncertainty gap and calibration issues. It introduces STAR, which blends a dynamic uncertainty measure that fuses base-model and full-model uncertainty across AL iterations with a hybrid regularization approach and MC dropout to improve calibration. Empirical results across GSM8K, BoolQ, and OpenBookQA show STAR substantially outperforms baselines and mitigates the failure modes observed in prior AL+LoRA setups, with ablations confirming the contributions of dynamic uncertainty and calibration. This work advances practical data-efficient fine-tuning for reasoning tasks under tight annotation budgets and resource constraints, while noting limitations and avenues for future exploration with larger models and other PEFT-method pairings.

Abstract

Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-shot learning through prompting methods, supervised training is still necessary for complex reasoning tasks. Because of their extensive parameters and memory consumption, both Parameter-Efficient Fine-Tuning (PEFT) methods and Memory-Efficient Fine-Tuning methods have been proposed for LLMs. Nevertheless, the issue of large annotated data consumption, the aim of Data-Efficient Fine-Tuning, remains unexplored. One obvious way is to combine the PEFT method with active learning. However, the experimental results show that such a combination is not trivial and yields inferior results. Through probe experiments, such observation might be explained by two main reasons: uncertainty gap and poor model calibration. Therefore, in this paper, we propose a novel approach to effectively integrate uncertainty-based active learning and LoRA. Specifically, for the uncertainty gap, we introduce a dynamic uncertainty measurement that combines the uncertainty of the base model and the uncertainty of the full model during the iteration of active learning. For poor model calibration, we incorporate the regularization method during LoRA training to keep the model from being over-confident, and the Monte-Carlo dropout mechanism is employed to enhance the uncertainty estimation. Experimental results show that the proposed approach outperforms existing baseline models on three complex reasoning tasks.
Paper Structure (26 sections, 8 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 8 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) Active learning combined with LoRA compared to passive learning. (b) Active learning combined with full parameter tuning compared to passive learning.
  • Figure 2: Density plot of confidence for wrong predictions.
  • Figure 3: (a) Heatmap of correlation between prediction entropy across different iterations; (b) Scatter plot for prediction entropy between base model (Iter0) and model after first iteration (Iter1); (c) Same as (b), except values are taken from Iter5 and Iter6.
  • Figure 4: The framework of STAR. It primarily consists of five steps: Model Inference, Data Querying, Data Labeling, Dataset Updating, and Model Training.
  • Figure 5: The Learning curves comparing the Predictive Entropy and Maximum Entropy methods, and each w/ STAR, against the RANDOM baseline. The first column corresponds to the GSM8K dataset, the second column to the BoolQ dataset, and the third column to the OpenBoolQA dataset.