Table of Contents
Fetching ...

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

Mohammad H. A. Monfared, Lucie Flek, Akbar Karimi

TL;DR

An agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples and consistently outperforms raw prompting in label preservation of the augmented data.

Abstract

We propose an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples. To isolate the effect of agentic structure, we also develop a closely matched prompting-based baseline using the same model and instructions. Both methods are evaluated across three ABSA subtasks (Aspect Term Extraction (ATE), Aspect Sentiment Classification (ATSC), and Aspect Sentiment Pair Extraction (ASPE)), four SemEval datasets, and two encoder-decoder models: T5-Base and Tk-Instruct. Our results show that the agentic augmentation outperforms raw prompting in label preservation of the augmented data, especially when the tasks require aspect term generation. In addition, when combined with real data, agentic augmentation provides higher gains, consistently outperforming prompting-based generation. These benefits are most pronounced for T5-Base, while the more heavily pretrained Tk-Instruct exhibits smaller improvements. As a result, augmented data helps T5-Base achieve comparable performance with its counterpart.

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

TL;DR

An agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples and consistently outperforms raw prompting in label preservation of the augmented data.

Abstract

We propose an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples. To isolate the effect of agentic structure, we also develop a closely matched prompting-based baseline using the same model and instructions. Both methods are evaluated across three ABSA subtasks (Aspect Term Extraction (ATE), Aspect Sentiment Classification (ATSC), and Aspect Sentiment Pair Extraction (ASPE)), four SemEval datasets, and two encoder-decoder models: T5-Base and Tk-Instruct. Our results show that the agentic augmentation outperforms raw prompting in label preservation of the augmented data, especially when the tasks require aspect term generation. In addition, when combined with real data, agentic augmentation provides higher gains, consistently outperforming prompting-based generation. These benefits are most pronounced for T5-Base, while the more heavily pretrained Tk-Instruct exhibits smaller improvements. As a result, augmented data helps T5-Base achieve comparable performance with its counterpart.
Paper Structure (23 sections, 7 figures, 10 tables)

This paper contains 23 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Overview of the agentic data augmentation workflow. A generator agent first extracts a style policy and produces candidate sentences, which are then evaluated by an evaluator agent. Only validated examples are saved, forming a high-quality synthetic dataset.
  • Figure 2: Average F1 score across all ABSA tasks for each dataset, comparing original vs. generated-only training, signaling the clear gap in quality between real and synthetic data.
  • Figure 3: $\Delta$F1 between baseline and added agentic data, across both T5-Base and Tk-Instruct. Each bar represents mean F1 scores averaged over ATE, ATSC, and ASPE tasks for each dataset. This plot shows the clear difference of the augmentation effectiveness on these models.
  • Figure 4: Agentic data augmentation narrows the F1 gap between T5-Base and Tk-Instruct. Grey bars indicate the original model gap; blue bars show the performance gain of T5-Base after agentic augmentation (Mixed x1).
  • Figure 5: Average F1 score change ($\Delta F\textsubscript{1}$) for agentic and prompting augmentation, computed as the difference between the Mixed x1 setup and original-only baseline. Scores are averaged across three ABSA subtasks and two model architectures for each dataset.
  • ...and 2 more figures