Table of Contents
Fetching ...

Iterative Data Generation with Large Language Models for Aspect-based Sentiment Analysis

Qihuang Zhong, Haiyun Li, Luyao Zhuang, Juhua Liu, Bo Du

TL;DR

IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations, and shows consistent and significant performance gains among five baseline ABSA models.

Abstract

Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data generation (DG) has become the standard for improving the performance of ABSA. However, current DG methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. With the advancement of large language models (LLMs), LLM-based DG has the potential to solve the above issues. Unfortunately, directly prompting LLMs struggles to generate the desired pseudo-label ABSA data, as LLMs are prone to hallucinations, leading to undesired data generation. To this end, we propose a systematic Iterative Data Generation framework, namely IDG, to boost the performance of ABSA. The core of IDG is to make full use of the powerful abilities (i.e., instruction-following, in-context learning and self-reflection) of LLMs to iteratively generate more fluent and diverse pseudo-label data, starting from an unsupervised sentence corpus. Specifically, IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations. Extensive experiments on four widely-used ABSA benchmarks show that IDG brings consistent and significant performance gains among five baseline ABSA models. More encouragingly, the synthetic data generated by IDG can achieve comparable or even better performance against the manually annotated data.

Iterative Data Generation with Large Language Models for Aspect-based Sentiment Analysis

TL;DR

IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations, and shows consistent and significant performance gains among five baseline ABSA models.

Abstract

Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data generation (DG) has become the standard for improving the performance of ABSA. However, current DG methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. With the advancement of large language models (LLMs), LLM-based DG has the potential to solve the above issues. Unfortunately, directly prompting LLMs struggles to generate the desired pseudo-label ABSA data, as LLMs are prone to hallucinations, leading to undesired data generation. To this end, we propose a systematic Iterative Data Generation framework, namely IDG, to boost the performance of ABSA. The core of IDG is to make full use of the powerful abilities (i.e., instruction-following, in-context learning and self-reflection) of LLMs to iteratively generate more fluent and diverse pseudo-label data, starting from an unsupervised sentence corpus. Specifically, IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations. Extensive experiments on four widely-used ABSA benchmarks show that IDG brings consistent and significant performance gains among five baseline ABSA models. More encouragingly, the synthetic data generated by IDG can achieve comparable or even better performance against the manually annotated data.
Paper Structure (30 sections, 11 figures, 7 tables)

This paper contains 30 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Comparison between our LLM-based data generation and prior small language model (SLM)-based methods. As seen, our method does not rely on the existing labeled data and can generate more high-quality and diverse pseudo-label ABSA data.
  • Figure 2: Overview of our IDG framework, covering three-stage processes: ❶ Aspect Extraction and Extension, ❷ Pseudo Data Generation and ❸ Evaluating and Filtering. Notably, "EX Prompt" and "ET Prompt" denote the aspect extraction and extension prompts, respectively. "ITAT Prompt" refers to the Iteration Teaching Analysis Prompt, which enforces the LLM to generate more diverse data.
  • Figure 3: Detailed prompts for aspect extraction and extension.
  • Figure 4: Illustration of ITAT prompt. The slots {example-input} and {example-output} denote the example of input-output pairs. The slots {domain} and {length} are the given sample domain and length. The slot {input} denotes the input aspect-sentiment pair.
  • Figure 5: Illustration of single-/multi-aspect data generation. For ease of illustration, we only show some cases in the laptop domain.
  • ...and 6 more figures