Table of Contents
Fetching ...

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning

Bo Yuan, Yulin Chen, Yin Zhang

TL;DR

Delora, a novel framework that decouples the sample selection from model training, is proposed and demonstrated the effectiveness of Delora in noisy label detection and text classification.

Abstract

Parameter-efficient fine-tuning (PEFT) large language models (LLMs) have shown impressive performance in various downstream tasks. However, in many real-world scenarios, the collected training data inevitably contains noisy labels. To learn from noisy labels, most solutions select samples with small losses for model training. However, the selected samples, in turn, impact the loss computation in the next iteration. An inaccurate initial selection can create a vicious cycle, leading to suboptimal performance. To break this cycle, we propose Delora, a novel framework that decouples the sample selection from model training. For sample selection, Delora establishes a noisy label detector by introducing clean and noisy LoRA. Benefiting from the memory effect, the clean LoRA is encouraged to memorize clean data, while the noisy LoRA is constrained to memorize mislabeled data, which serves as a learnable threshold for selecting clean and noisy samples. For model training, Delora can use carefully selected samples to fine-tune language models seamlessly. Experimental results on synthetic and real-world noisy datasets demonstrate the effectiveness of Delora in noisy label detection and text classification.

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning

TL;DR

Delora, a novel framework that decouples the sample selection from model training, is proposed and demonstrated the effectiveness of Delora in noisy label detection and text classification.

Abstract

Parameter-efficient fine-tuning (PEFT) large language models (LLMs) have shown impressive performance in various downstream tasks. However, in many real-world scenarios, the collected training data inevitably contains noisy labels. To learn from noisy labels, most solutions select samples with small losses for model training. However, the selected samples, in turn, impact the loss computation in the next iteration. An inaccurate initial selection can create a vicious cycle, leading to suboptimal performance. To break this cycle, we propose Delora, a novel framework that decouples the sample selection from model training. For sample selection, Delora establishes a noisy label detector by introducing clean and noisy LoRA. Benefiting from the memory effect, the clean LoRA is encouraged to memorize clean data, while the noisy LoRA is constrained to memorize mislabeled data, which serves as a learnable threshold for selecting clean and noisy samples. For model training, Delora can use carefully selected samples to fine-tune language models seamlessly. Experimental results on synthetic and real-world noisy datasets demonstrate the effectiveness of Delora in noisy label detection and text classification.

Paper Structure

This paper contains 37 sections, 10 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: A comparison between other sample selection methods (left) and our method (right) for LNL tasks. Our method decouples sample selection (stage 1) from model training (stage 2) by training a noisy label detector and classifier model separately.
  • Figure 2: The architecture of our proposed framework Delora. Stage 1: We introduce two separate LoRAs (clean LoRA $\Delta w_c$ and noisy LoRA $\Delta w_n$) to construct the noisy label detector. Stage 2: We leverage the selected clean samples and relabeled noisy samples to train the classifier model.
  • Figure 3: Memorization performance of different LoRAs during fine-tuning on Trec under 20%A. The green line refers to the base model without noise handling.
  • Figure 4: Precision and recall of noisy label detection with BERT as the backbone on the Trec dataset.
  • Figure 5: Analysis for the choice of hyper-parameter $h_2$ under different noise ratios on the Trec datasets.
  • ...and 1 more figures