DS$^2$-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis
Hongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu
TL;DR
Few-shot ABSA suffers from data scarcity and limited diversity in training samples. DS$^2$-ABSA introduces a dual-stream data synthesis framework—combining key-point-driven brainstorming and instance-driven transformations—coupled with a label refinement module to re-estimate synthetic labels using normalization and noisy self-training. Empirical results on four ABSA datasets show this approach consistently outperforms existing low-resource and LLM-based ABSA methods, thanks to improved data diversity and label quality. The method is cost-efficient (no extra corpora) and adaptable to other domains, offering a practical solution for leveraging LLMs to improve few-shot ABSA and potentially other NLP tasks.
Abstract
Recently developed large language models (LLMs) have presented promising new avenues to address data scarcity in low-resource scenarios. In few-shot aspect-based sentiment analysis (ABSA), previous efforts have explored data augmentation techniques, which prompt LLMs to generate new samples by modifying existing ones. However, these methods fail to produce adequately diverse data, impairing their effectiveness. Besides, some studies apply in-context learning for ABSA by using specific instructions and a few selected examples as prompts. Though promising, LLMs often yield labels that deviate from task requirements. To overcome these limitations, we propose DS$^2$-ABSA, a dual-stream data synthesis framework targeted for few-shot ABSA. It leverages LLMs to synthesize data from two complementary perspectives: \textit{key-point-driven} and \textit{instance-driven}, which effectively generate diverse and high-quality ABSA samples in low-resource settings. Furthermore, a \textit{label refinement} module is integrated to improve the synthetic labels. Extensive experiments demonstrate that DS$^2$-ABSA significantly outperforms previous few-shot ABSA solutions and other LLM-oriented data generation methods.
