Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction
Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu
TL;DR
This work tackles the data-scarce challenge of Aspect Sentiment Quad Prediction (ASQP) by introducing a pseudo-label scorer that measures the match between reviews and their pseudo-labels. Trained with ranking-based objectives on a human- and AI-annotated comparison dataset, the scorer is employed to filter and rerank pseudo-labels within a self-training framework, boosting ASQP performance across four public datasets. Key findings show consistent gains for GAS and MUL models (average improvements around 3–5% in F1), the effectiveness of AI-generated comparison data, and the potential of the scorer as a reranker to further enhance results. The approach highlights a practical path for scalable ASQP augmentation and suggests directions for improving data synthesis and annotation efficiency.
Abstract
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA .
