Table of Contents
Fetching ...

CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning

Yeachan Kim, Junho Kim, SangKeun Lee

TL;DR

This work proposes Clean Routing (CleaR), a novel routing-based PEFT approach that adaptively activates PEFT modules that leads to substantially improved performance in noisy environments.

Abstract

Parameter-efficient fine-tuning (PEFT) has enabled the efficient optimization of cumbersome language models in real-world settings. However, as datasets in such environments often contain noisy labels that adversely affect performance, PEFT methods are inevitably exposed to noisy labels. Despite this challenge, the adaptability of PEFT to noisy environments remains underexplored. To bridge this gap, we investigate various PEFT methods under noisy labels. Interestingly, our findings reveal that PEFT has difficulty in memorizing noisy labels due to its inherently limited capacity, resulting in robustness. However, we also find that such limited capacity simultaneously makes PEFT more vulnerable to interference of noisy labels, impeding the learning of clean samples. To address this issue, we propose Clean Routing (CleaR), a novel routing-based PEFT approach that adaptively activates PEFT modules. In CleaR, PEFT modules are preferentially exposed to clean data while bypassing the noisy ones, thereby minimizing the noisy influence. To verify the efficacy of CleaR, we perform extensive experiments on diverse configurations of noisy labels. The results convincingly demonstrate that CleaR leads to substantially improved performance in noisy environments.

CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning

TL;DR

This work proposes Clean Routing (CleaR), a novel routing-based PEFT approach that adaptively activates PEFT modules that leads to substantially improved performance in noisy environments.

Abstract

Parameter-efficient fine-tuning (PEFT) has enabled the efficient optimization of cumbersome language models in real-world settings. However, as datasets in such environments often contain noisy labels that adversely affect performance, PEFT methods are inevitably exposed to noisy labels. Despite this challenge, the adaptability of PEFT to noisy environments remains underexplored. To bridge this gap, we investigate various PEFT methods under noisy labels. Interestingly, our findings reveal that PEFT has difficulty in memorizing noisy labels due to its inherently limited capacity, resulting in robustness. However, we also find that such limited capacity simultaneously makes PEFT more vulnerable to interference of noisy labels, impeding the learning of clean samples. To address this issue, we propose Clean Routing (CleaR), a novel routing-based PEFT approach that adaptively activates PEFT modules. In CleaR, PEFT modules are preferentially exposed to clean data while bypassing the noisy ones, thereby minimizing the noisy influence. To verify the efficacy of CleaR, we perform extensive experiments on diverse configurations of noisy labels. The results convincingly demonstrate that CleaR leads to substantially improved performance in noisy environments.

Paper Structure

This paper contains 50 sections, 5 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Comparison between PEFT methods and full fine-tuning on SST-5 with symmetric noise (60%). Dashed lines represent the training accuracy and loss of clean samples on uncorrupted datasets (i.e. only clean samples).
  • Figure 2: Overview of the Clean Routing. CleaR first estimate the probability of each sample being clean based on the training losses. Based on the estimated probability, CleaR adaptively activates PEFT modules by favoring the potentially clean samples.
  • Figure 3: Ratios of memorizing clean (Left, larger is better) and noisy samples (Right, smaller is better) on different routing methods. Dashed lines and solid lines indicate Deterministic Routing and Clean Routing (ours), respectively.
  • Figure 4: Impact on two memorizations when applying CleaR to PEFT methods. Best viewed in color..
  • Figure 5: Comparison between PEFT methods and full fine-tuning on BANKING77 with symmetric noise (60%). Dashed lines represent the training accuracy and loss of clean samples on uncorrupted datasets (i.e. only clean samples).
  • ...and 1 more figures