TARDiS : Text Augmentation for Refining Diversity and Separability
Kyungmin Kim, SangHun Im, GiBaeg Kim, Heung-Seon Oh
TL;DR
TARDiS tackles limitations in two-stage LLM-based text augmentation for few-shot classification by introducing SEG and CEG, which generate diverse and discriminative samples through class-specific prompts, and CA, which verifies and_modify_samples to align them with target classes. Empirical results on BANKING77, CLINC150, HWU64, and TREC6 show state-of-the-art performance with a relatively small LLM, aided by ablations that demonstrate the complementary benefits of SEG, CEG, and CA. The method preserves overall class distributions while increasing within-class diversity and between-class separability, addressing misalignment, OOD, and FN issues in the few-shot regime. The work offers practical gains for few-shot text classification and lays groundwork for expanding TA to more complex tasks and efficiency-enhanced pipelines.
Abstract
Text augmentation (TA) is a critical technique for text classification, especially in few-shot settings. This paper introduces a novel LLM-based TA method, TARDiS, to address challenges inherent in the generation and alignment stages of two-stage TA methods. For the generation stage, we propose two generation processes, SEG and CEG, incorporating multiple class-specific prompts to enhance diversity and separability. For the alignment stage, we introduce a class adaptation (CA) method to ensure that generated examples align with their target classes through verification and modification. Experimental results demonstrate TARDiS's effectiveness, outperforming state-of-the-art LLM-based TA methods in various few-shot text classification tasks. An in-depth analysis confirms the detailed behaviors at each stage.
