Table of Contents
Fetching ...

Self Iterative Label Refinement via Robust Unlabeled Learning

Hikaru Asano, Tadashi Kozuno, Yukino Baba

TL;DR

The paper introduces an iterative self-refinement framework based on robust Unlabeled-Unlabeled (UU) learning to refine LLM-generated pseudo-labels for binary classification with minimal supervision. By leveraging two unlabeled datasets with different positive priors, the method trains a robust classifier that re-labels the data, iterating toward higher accuracy than direct LLM labeling and existing self-refinement approaches. It demonstrates strong performance across easy and hard NLP tasks, including low-resource languages, patents, and protein structures, and extends to safety alignment in generative tasks via RLHF. The work highlights practical benefits for scalable, annotation-light LLM enhancement and opens avenues for broader self-refinement in generative AI systems.

Abstract

Recent advances in large language models (LLMs) have yielded impressive performance on various tasks, yet they often depend on high-quality feedback that can be costly. Self-refinement methods attempt to leverage LLMs' internal evaluation mechanisms with minimal human supervision; however, these approaches frequently suffer from inherent biases and overconfidence, especially in domains where the models lack sufficient internal knowledge, resulting in performance degradation. As an initial step toward enhancing self-refinement for broader applications, we introduce an iterative refinement pipeline that employs the Unlabeled-Unlabeled learning framework to improve LLM-generated pseudo-labels for classification tasks. By exploiting two unlabeled datasets with differing positive class ratios, our approach iteratively denoises and refines the initial pseudo-labels, thereby mitigating the adverse effects of internal biases with minimal human supervision. Evaluations on diverse datasets, including low-resource language corpora, patent classifications, and protein structure categorizations, demonstrate that our method consistently outperforms both initial LLM's classification performance and the self-refinement approaches by cutting-edge models (e.g., GPT-4o and DeepSeek-R1). Moreover, we experimentally confirm that our refined classifier facilitates effective post-training alignment for safety in LLMs and demonstrate successful self-refinement in generative tasks as well.\footnote{Our code is available at https://github.com/HikaruAsano/self-iterative-label-refinement.}

Self Iterative Label Refinement via Robust Unlabeled Learning

TL;DR

The paper introduces an iterative self-refinement framework based on robust Unlabeled-Unlabeled (UU) learning to refine LLM-generated pseudo-labels for binary classification with minimal supervision. By leveraging two unlabeled datasets with different positive priors, the method trains a robust classifier that re-labels the data, iterating toward higher accuracy than direct LLM labeling and existing self-refinement approaches. It demonstrates strong performance across easy and hard NLP tasks, including low-resource languages, patents, and protein structures, and extends to safety alignment in generative tasks via RLHF. The work highlights practical benefits for scalable, annotation-light LLM enhancement and opens avenues for broader self-refinement in generative AI systems.

Abstract

Recent advances in large language models (LLMs) have yielded impressive performance on various tasks, yet they often depend on high-quality feedback that can be costly. Self-refinement methods attempt to leverage LLMs' internal evaluation mechanisms with minimal human supervision; however, these approaches frequently suffer from inherent biases and overconfidence, especially in domains where the models lack sufficient internal knowledge, resulting in performance degradation. As an initial step toward enhancing self-refinement for broader applications, we introduce an iterative refinement pipeline that employs the Unlabeled-Unlabeled learning framework to improve LLM-generated pseudo-labels for classification tasks. By exploiting two unlabeled datasets with differing positive class ratios, our approach iteratively denoises and refines the initial pseudo-labels, thereby mitigating the adverse effects of internal biases with minimal human supervision. Evaluations on diverse datasets, including low-resource language corpora, patent classifications, and protein structure categorizations, demonstrate that our method consistently outperforms both initial LLM's classification performance and the self-refinement approaches by cutting-edge models (e.g., GPT-4o and DeepSeek-R1). Moreover, we experimentally confirm that our refined classifier facilitates effective post-training alignment for safety in LLMs and demonstrate successful self-refinement in generative tasks as well.\footnote{Our code is available at https://github.com/HikaruAsano/self-iterative-label-refinement.}

Paper Structure

This paper contains 49 sections, 8 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Overview of our iterative refinement pipeline. First, an LLM annotator generates initial pseudo-labels for an unlabeled corpus, dividing it into pseudo-positive and pseudo-negative corpora. Next, we train a classifier using robust UU learning on these pseudo corpora, yielding a model that outperforms the initial LLM annotations. Finally, the classifier re-labels the entire dataset, updating the pseudo-labels for the next iteration. Repeating this cycle gradually refines the pseudo-labels, leading to increasingly reliable labels.
  • Figure 2: Classification accuracy over five iterations for three datasets. The solid lines represent the mean values, and the shaded areas show the mean $\pm$ standard deviation. Both variants of our approach, Ours (Oracle) and Ours (few-labeled), demonstrate steady improvements as the iteration increases. Notably, even in scenarios where the baselines fail to learn the classification task, our method continues to exhibit iterative performance gains. This robustness highlights the strength of our iterative refinement strategy, even under minimal supervision settings like 50 labeled examples. Detailed numerical results are provided in Appendix \ref{['sec:experimental_results_appendix']}.
  • Figure 3: Classification accuracy curves over five iterations on three challenging datasets. Ours (Oracle) uses the exact class prior for UU learning, while Ours (few-labeled) estimates these priors from only 50 labeled examples. Our method consistently improves accuracy and outperforms both LLM self-refinement by GPT-4o series and advanced reasoning model DeepSeek-R1. Detailed numerical results are provided in Appendix \ref{['sec:experimental_results_appendix']}.
  • Figure 4: Reward distribution of generated answers on the Safety dataset after alignment. Both variants of our method, Ours (Oracle) and Ours (50-labeled), shift the distribution toward higher (safer) rewards compared with the SFT baseline and the Vanilla RLAIF. Legend values denote the mean reward across three random seeds.
  • Figure 5: Example prompt for safety evaluation, which follows a similar format to prompts used for other datasets. The examples illustrate both safe (true) and unsafe (false) outcomes.
  • ...and 2 more figures