Table of Contents
Fetching ...

DS$^2$-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis

Hongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu

TL;DR

Few-shot ABSA suffers from data scarcity and limited diversity in training samples. DS$^2$-ABSA introduces a dual-stream data synthesis framework—combining key-point-driven brainstorming and instance-driven transformations—coupled with a label refinement module to re-estimate synthetic labels using normalization and noisy self-training. Empirical results on four ABSA datasets show this approach consistently outperforms existing low-resource and LLM-based ABSA methods, thanks to improved data diversity and label quality. The method is cost-efficient (no extra corpora) and adaptable to other domains, offering a practical solution for leveraging LLMs to improve few-shot ABSA and potentially other NLP tasks.

Abstract

Recently developed large language models (LLMs) have presented promising new avenues to address data scarcity in low-resource scenarios. In few-shot aspect-based sentiment analysis (ABSA), previous efforts have explored data augmentation techniques, which prompt LLMs to generate new samples by modifying existing ones. However, these methods fail to produce adequately diverse data, impairing their effectiveness. Besides, some studies apply in-context learning for ABSA by using specific instructions and a few selected examples as prompts. Though promising, LLMs often yield labels that deviate from task requirements. To overcome these limitations, we propose DS$^2$-ABSA, a dual-stream data synthesis framework targeted for few-shot ABSA. It leverages LLMs to synthesize data from two complementary perspectives: \textit{key-point-driven} and \textit{instance-driven}, which effectively generate diverse and high-quality ABSA samples in low-resource settings. Furthermore, a \textit{label refinement} module is integrated to improve the synthetic labels. Extensive experiments demonstrate that DS$^2$-ABSA significantly outperforms previous few-shot ABSA solutions and other LLM-oriented data generation methods.

DS$^2$-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis

TL;DR

Few-shot ABSA suffers from data scarcity and limited diversity in training samples. DS-ABSA introduces a dual-stream data synthesis framework—combining key-point-driven brainstorming and instance-driven transformations—coupled with a label refinement module to re-estimate synthetic labels using normalization and noisy self-training. Empirical results on four ABSA datasets show this approach consistently outperforms existing low-resource and LLM-based ABSA methods, thanks to improved data diversity and label quality. The method is cost-efficient (no extra corpora) and adaptable to other domains, offering a practical solution for leveraging LLMs to improve few-shot ABSA and potentially other NLP tasks.

Abstract

Recently developed large language models (LLMs) have presented promising new avenues to address data scarcity in low-resource scenarios. In few-shot aspect-based sentiment analysis (ABSA), previous efforts have explored data augmentation techniques, which prompt LLMs to generate new samples by modifying existing ones. However, these methods fail to produce adequately diverse data, impairing their effectiveness. Besides, some studies apply in-context learning for ABSA by using specific instructions and a few selected examples as prompts. Though promising, LLMs often yield labels that deviate from task requirements. To overcome these limitations, we propose DS-ABSA, a dual-stream data synthesis framework targeted for few-shot ABSA. It leverages LLMs to synthesize data from two complementary perspectives: \textit{key-point-driven} and \textit{instance-driven}, which effectively generate diverse and high-quality ABSA samples in low-resource settings. Furthermore, a \textit{label refinement} module is integrated to improve the synthetic labels. Extensive experiments demonstrate that DS-ABSA significantly outperforms previous few-shot ABSA solutions and other LLM-oriented data generation methods.

Paper Structure

This paper contains 44 sections, 6 figures, 14 tables.

Figures (6)

  • Figure 1: Overview of the proposed DS$^2$-ABSA. The process begins with parallel dual-stream data synthesis: the key-point-driven stream leverages LLMs to brainstorm a set of critical ABSA attributes for conditional generation, while the instance-driven stream applies a small seed dataset to perform multi-level transformations. The resulting data are then combined and processed through normalization and self-training for noise handling.
  • Figure 2: Effect of noisy self-training over iterations. Iteration 0 means noisy self-training is not conducted.
  • Figure 3: Data diversity comparison on Res14 under the 5%-shot setting, including (a) few-shot gold data; (b) data augmentation (MELM, AugGPT, CoTAM); (c) generic data synthesis (ZeroGen, Self-Instruct); (d) instance-driven synthesis; and (e) key-point-driven synthesis. We use Instructor su-etal-2023-one for text embedding and t-SNE for visualization, displaying at most 5k samples for clarity. See Figure \ref{['fig: lap_diversity']} for results on Lap14.
  • Figure 4: Effect of dual-stream synthesis methods using different training data proportions.
  • Figure 5: Impact of the number of synthetic data.
  • ...and 1 more figures