KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
Kibeom Nam
TL;DR
The paper tackles the paucity of high-quality Korean ABSA data by proposing KPC-cF, a dual-filtered, translation-informed framework that aligns implicit features between translated benchmarks and real Korean data. It introduces a two-phase pseudo-classifier with LaBSE-based corpus filtering and MSP-based confidence filtering to mitigate linguistic shifts and label noise, enabling effective fine-tuning on real Korean reviews. Experimental results on KR3 show improved Aspect Category Detection and Polarity over translation-only baselines, with Kor-SemEval pretraining and dual filtering delivering the strongest gains. The work provides a practical methodology for deploying ABSA in Korean and other low-resource languages, and it releases Kor-SemEval/KR3 data and code to spur further research.
Abstract
Investigations into Aspect-Based Sentiment Analysis (ABSA) for Korean industrial reviews are notably lacking in the existing literature. Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean. It optimizes prediction labels by integrating translated benchmark and unlabeled Korean data. Using a model fine-tuned on translated data, we pseudo-labeled the actual Korean NLI set. Subsequently, we applied LaBSE and \MSP{}-based filtering to this pseudo-NLI set as implicit feature, enhancing Aspect Category Detection and Polarity determination through additional training. Incorporating dual filtering, this model bridged dataset gaps and facilitates feature alignment with minimal resources. By implementing alignment pipelines, our approach aims to leverage high-resource datasets to develop reliable predictive and refined models within corporate or individual communities in low-resource language countries. Compared to English ABSA, our framework showed an approximately 3\% difference in F1 scores and accuracy. We will release our dataset and code for Korean ABSA, at this link.
