Table of Contents
Fetching ...

KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering

Kibeom Nam

TL;DR

The paper tackles the paucity of high-quality Korean ABSA data by proposing KPC-cF, a dual-filtered, translation-informed framework that aligns implicit features between translated benchmarks and real Korean data. It introduces a two-phase pseudo-classifier with LaBSE-based corpus filtering and MSP-based confidence filtering to mitigate linguistic shifts and label noise, enabling effective fine-tuning on real Korean reviews. Experimental results on KR3 show improved Aspect Category Detection and Polarity over translation-only baselines, with Kor-SemEval pretraining and dual filtering delivering the strongest gains. The work provides a practical methodology for deploying ABSA in Korean and other low-resource languages, and it releases Kor-SemEval/KR3 data and code to spur further research.

Abstract

Investigations into Aspect-Based Sentiment Analysis (ABSA) for Korean industrial reviews are notably lacking in the existing literature. Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean. It optimizes prediction labels by integrating translated benchmark and unlabeled Korean data. Using a model fine-tuned on translated data, we pseudo-labeled the actual Korean NLI set. Subsequently, we applied LaBSE and \MSP{}-based filtering to this pseudo-NLI set as implicit feature, enhancing Aspect Category Detection and Polarity determination through additional training. Incorporating dual filtering, this model bridged dataset gaps and facilitates feature alignment with minimal resources. By implementing alignment pipelines, our approach aims to leverage high-resource datasets to develop reliable predictive and refined models within corporate or individual communities in low-resource language countries. Compared to English ABSA, our framework showed an approximately 3\% difference in F1 scores and accuracy. We will release our dataset and code for Korean ABSA, at this link.

KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering

TL;DR

The paper tackles the paucity of high-quality Korean ABSA data by proposing KPC-cF, a dual-filtered, translation-informed framework that aligns implicit features between translated benchmarks and real Korean data. It introduces a two-phase pseudo-classifier with LaBSE-based corpus filtering and MSP-based confidence filtering to mitigate linguistic shifts and label noise, enabling effective fine-tuning on real Korean reviews. Experimental results on KR3 show improved Aspect Category Detection and Polarity over translation-only baselines, with Kor-SemEval pretraining and dual filtering delivering the strongest gains. The work provides a practical methodology for deploying ABSA in Korean and other low-resource languages, and it releases Kor-SemEval/KR3 data and code to spur further research.

Abstract

Investigations into Aspect-Based Sentiment Analysis (ABSA) for Korean industrial reviews are notably lacking in the existing literature. Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean. It optimizes prediction labels by integrating translated benchmark and unlabeled Korean data. Using a model fine-tuned on translated data, we pseudo-labeled the actual Korean NLI set. Subsequently, we applied LaBSE and \MSP{}-based filtering to this pseudo-NLI set as implicit feature, enhancing Aspect Category Detection and Polarity determination through additional training. Incorporating dual filtering, this model bridged dataset gaps and facilitates feature alignment with minimal resources. By implementing alignment pipelines, our approach aims to leverage high-resource datasets to develop reliable predictive and refined models within corporate or individual communities in low-resource language countries. Compared to English ABSA, our framework showed an approximately 3\% difference in F1 scores and accuracy. We will release our dataset and code for Korean ABSA, at this link.
Paper Structure (36 sections, 50 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 36 sections, 50 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: t-SNE visualization of last [CLS] embeddings extracted from KR3 test set by two different experimental BaselineXLM-R encoders. Our $L_\text{align}$ on filtering data encourages the encoder to produce discriminable representations of different sentiment polarities, focusing on relevant aspects.
  • Figure 2: A diagram illustrating the two phase of our method: (1) Fine-tuning Kor-SemEval and generate pseudo-labeled KR3, (2) Fine-tuning KR3 using baseline model selected phase 1. We illustrated the filtering process (right) for fine-tuning KR3 data. Blue arrows (left & middle) indicate that this model is used to predict best label of review.
  • Figure 3: Top - Maximum Probability Distribution of the Fine-Tuned Model, KPC-cF (left) vs. BaselineXLM-R+TR (right), Bottom - Maximum Probability Distribution of the BaselineXLM-R+TR with LaBSE Score Distribution, All classes (left) vs. 4 classes (right).
  • Figure 4: Left & Middle - Maximum Probability Distribution of the BaselinemBERT+TR with LaBSE Score Distribution, All classes (left) vs. 4 classes (middle), Right - Maximum Probability Distribution of the Fine-Tuned Model, BaselinemBERT+TR.
  • Figure 5: Performance of ACD and ACP on KR3 test set during adaptive fine-tuning ($D_T$ and ${D_S}_t \rightarrow D_T$). Left: results with the addition of other fine-tuned BaselineXLM-R. th denotes the threshold for confidence of pseudo-labeling, and L denotes the threshold for LaBSE filtering; Right: BaselineXLM-R tuning compared in this paper. Blue line represents KPC-cF.
  • ...and 3 more figures

Theorems & Definitions (4)

  • proof : Proof of Lemma 1
  • proof : Proof of Lemma 2
  • proof : Proof of Lemma 4
  • proof : Proof of Convergence Theorem