Table of Contents
Fetching ...

Salutary Labeling with Zero Human Annotation

Wenxiao Xiao, Hongfu Liu

TL;DR

Salutary labeling tackles labeling cost and label quality in active learning by automatically selecting unlabeled samples and assigning salutary labels that maximize the estimated influence on a validation loss $\mathcal{L}_v$, as computed by the influence function $\mathcal{I}(x_j,y_j)$. By evaluating $\mathcal{I}(x_j,c)$ for all possible labels $c \in \mathcal{C}$ and choosing $\hat{c}=\operatorname{arg\,max}_c \mathcal{I}(x_j,c)$, the method merges querying and labeling into one autonomous step and selects the top $b$ samples per round. Empirical results across nine datasets and LLM-fine-tuning scenarios show consistently superior performance to traditional active-learning baselines and even surpass ground-truth labeling in some settings, all without human annotation. The approach leverages a convex surrogate for non-convex models via embeddings and demonstrates practical potential for cost-effective, data-efficient learning with broad applicability.

Abstract

Active learning strategically selects informative unlabeled data points and queries their ground truth labels for model training. The prevailing assumption underlying this machine learning paradigm is that acquiring these ground truth labels will optimally enhance model performance. However, this assumption may not always hold true or maximize learning capacity, particularly considering the costly labor annotations required for ground truth labels. In contrast to traditional ground truth labeling, this paper proposes salutary labeling, which automatically assigns the most beneficial labels to the most informative samples without human annotation. Specifically, we utilize the influence function, a tool for estimating sample influence, to select newly added samples and assign their salutary labels by choosing the category that maximizes their positive influence. This process eliminates the need for human annotation. Extensive experiments conducted on nine benchmark datasets demonstrate the superior performance of our salutary labeling approach over traditional active learning strategies. Additionally, we provide several in-depth explorations and practical applications of large language model (LLM) fine-tuning.

Salutary Labeling with Zero Human Annotation

TL;DR

Salutary labeling tackles labeling cost and label quality in active learning by automatically selecting unlabeled samples and assigning salutary labels that maximize the estimated influence on a validation loss , as computed by the influence function . By evaluating for all possible labels and choosing , the method merges querying and labeling into one autonomous step and selects the top samples per round. Empirical results across nine datasets and LLM-fine-tuning scenarios show consistently superior performance to traditional active-learning baselines and even surpass ground-truth labeling in some settings, all without human annotation. The approach leverages a convex surrogate for non-convex models via embeddings and demonstrates practical potential for cost-effective, data-efficient learning with broad applicability.

Abstract

Active learning strategically selects informative unlabeled data points and queries their ground truth labels for model training. The prevailing assumption underlying this machine learning paradigm is that acquiring these ground truth labels will optimally enhance model performance. However, this assumption may not always hold true or maximize learning capacity, particularly considering the costly labor annotations required for ground truth labels. In contrast to traditional ground truth labeling, this paper proposes salutary labeling, which automatically assigns the most beneficial labels to the most informative samples without human annotation. Specifically, we utilize the influence function, a tool for estimating sample influence, to select newly added samples and assign their salutary labels by choosing the category that maximizes their positive influence. This process eliminates the need for human annotation. Extensive experiments conducted on nine benchmark datasets demonstrate the superior performance of our salutary labeling approach over traditional active learning strategies. Additionally, we provide several in-depth explorations and practical applications of large language model (LLM) fine-tuning.
Paper Structure (20 sections, 2 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 2 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Experimental results on DiabetesDecencière_Zhang_et dataset with ground truth and salutary labels. We select 300 labeled samples for traditional classification training and leave the remaining samples as unlabeled data from active learning. In both figures, the X-axis represents the sample influence with salutary labels. According to this measurement, we divide both labeled/unlabeled data into 20 equal-sized bins. The red and dark blue solid lines denote the performance of adding each bin into the labeled data with ground truth and salutary labels, respectively, and the dashed blue line denotes the performance when training with the original labeled data.
  • Figure 2: Comparison of salutary labeling and baseline active learning methods on nine datasets over 10 rounds of learning cycle. In all figures, the X-axis represents the training iterations, where round 0 is the initial training. The shaded area is the standard deviation across 5 different random runs. Notice that the entropy, margin and uncertainty sampling yield the same results for binary datasets.
  • Figure 3: Influence estimation vs. actual loss difference of add-one-in retraining on Diabetic (left), CelebA (middle), and Bank (right) datasets. In all plots, the horizontal axes represent the estimated influence on validation loss, while the vertical axes show the actual loss change. Their correlation is quantified with the Spearman's rank correlation coefficient (Spearman-r). We randomly selected 300 samples in each plot to ensure clarity in visualization.
  • Figure 4: Accuracy of the final model after 10 rounds of active learning for LLM fine-tuning on WNLI (left), MRPC (middle) and RTE (right) datasets of GLUE repository.
  • Figure 5: Prediction accuracy of salutary labeling and baseline methods with $b$ set at 1% of the pool samples on CelebA (left), Waveform (middle), and Electrical (right) datasets.
  • ...and 2 more figures