Table of Contents
Fetching ...

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, Peter Brusilovsky

TL;DR

This work compares sample selection strategies existing in few-shot learning literature and investigates their effects in LLM-based textual augmentation, finding that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases.

Abstract

The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

TL;DR

This work compares sample selection strategies existing in few-shot learning literature and investigates their effects in LLM-based textual augmentation, finding that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases.

Abstract

The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.

Paper Structure

This paper contains 17 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of our methodology. For each dataset, we randomly sample 20 samples per label which are then used to collect up to 5 augmented samples per each seed sample. These seeds are used for fine-tuning with the augmented samples to evaluate each sample selection strategy. This entire process is repeated 3 times with different random seeds. Similar sample selection strategies have the same colour.
  • Figure 2: Aggregated difference across all LLMs and random seeds in mean F1-Macro for classifiers trained on various sample selection strategies against the best performing baseline of either random few-shot or zero-shot. While some strategies perform well in certain cases as per Table \ref{['tab:best_perf_strategies']}, they fail to make a positive impact on classifier performance against baseline strategies in general.
  • Figure 3: Aggregated performance across all LLMs and random seeds in F1-Macro for classifiers trained on various sample selection strategies together with the baselines of either random few-shot or zero-shot on in-distribution data.
  • Figure 4: Aggregated performance across all LLMs and random seeds in F1-Macro for classifiers trained on various sample selection strategies together with the baselines of either random few-shot or zero-shot on out-of-distribution data.