Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Yice Zhang; Jie Zeng; Weiming Hu; Ziyi Wang; Shiwei Chen; Ruifeng Xu

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu

TL;DR

This work tackles the data-scarce challenge of Aspect Sentiment Quad Prediction (ASQP) by introducing a pseudo-label scorer that measures the match between reviews and their pseudo-labels. Trained with ranking-based objectives on a human- and AI-annotated comparison dataset, the scorer is employed to filter and rerank pseudo-labels within a self-training framework, boosting ASQP performance across four public datasets. Key findings show consistent gains for GAS and MUL models (average improvements around 3–5% in F1), the effectiveness of AI-generated comparison data, and the potential of the scorer as a reranker to further enhance results. The approach highlights a practical path for scalable ASQP augmentation and suggests directions for improving data synthesis and annotation efficiency.

Abstract

Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA .

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

TL;DR

Abstract

Paper Structure (20 sections, 8 equations, 5 figures, 13 tables)

This paper contains 20 sections, 8 equations, 5 figures, 13 tables.

Introduction
Background
Comparison Dataset
Data Preparation
Annotation Process
Statistics
Our Approach
Pseudo-label Scorer
Self-Training with Data Filtering
Pseudo-label Scorer as Reranker
Experiments
Experiment Setup
Analysis of Pesudo-label Scorer
Analysis of Self-Training
Further Analysis
...and 5 more sections

Figures (5)

Figure 1: Illustration of our pseudo-label scorer.
Figure 2: Performance trends of comparison data with increasing data quantity (accuracy, %): (a) results on ACOS-Laptop; (b) results on ACOS-Rest.
Figure 3: Performance of GAS on the augmented dataset under different match scores ($F_1$-score, %).
Figure 4: Performance of GAS under different numbers of augmented samples ($F_1$-score, %).
Figure 5: Prompt for AI Annotation in ACOS-Rest.

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

TL;DR

Abstract

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (5)