Table of Contents
Fetching ...

An Aspect Extraction Framework using Different Embedding Types, Learning Models, and Dependency Structure

Ali Erkan, Tunga Güngör

TL;DR

This paper tackles fine-grained ABSA by focusing on Turkish aspect extraction using a hybrid neural framework that combines word and POS embeddings with a dependency-tree based positional encoding within a BiLSTM-CRF architecture, complemented by Viterbi decoding. It compares random/Word2Vec and Turkish BERT word representations across two Turkish ABSA datasets, showing that tree positional encoding and contextualized embeddings jointly boost performance. The method achieves about 75.7 F1 on the original Turkish SemEval dataset and 72.4 F1 on a machine-translated Turkish version, outperforming several baselines on Turkish data and validating the approach’s effectiveness. A new Turkish ABSA dataset and public code are released to support further research, and future work includes leveraging large language models and building an end-to-end Turkish ABSA system.

Abstract

Aspect-based sentiment analysis has gained significant attention in recent years due to its ability to provide fine-grained insights for sentiment expressions related to specific features of entities. An important component of aspect-based sentiment analysis is aspect extraction, which involves identifying and extracting aspect terms from text. Effective aspect extraction serves as the foundation for accurate sentiment analysis at the aspect level. In this paper, we propose aspect extraction models that use different types of embeddings for words and part-of-speech tags and that combine several learning models. We also propose tree positional encoding that is based on dependency parsing output to capture better the aspect positions in sentences. In addition, a new aspect extraction dataset is built for Turkish by machine translating an English dataset in a controlled setting. The experiments conducted on two Turkish datasets showed that the proposed models mostly outperform the studies that use the same datasets, and incorporating tree positional encoding increases the performance of the models.

An Aspect Extraction Framework using Different Embedding Types, Learning Models, and Dependency Structure

TL;DR

This paper tackles fine-grained ABSA by focusing on Turkish aspect extraction using a hybrid neural framework that combines word and POS embeddings with a dependency-tree based positional encoding within a BiLSTM-CRF architecture, complemented by Viterbi decoding. It compares random/Word2Vec and Turkish BERT word representations across two Turkish ABSA datasets, showing that tree positional encoding and contextualized embeddings jointly boost performance. The method achieves about 75.7 F1 on the original Turkish SemEval dataset and 72.4 F1 on a machine-translated Turkish version, outperforming several baselines on Turkish data and validating the approach’s effectiveness. A new Turkish ABSA dataset and public code are released to support further research, and future work includes leveraging large language models and building an end-to-end Turkish ABSA system.

Abstract

Aspect-based sentiment analysis has gained significant attention in recent years due to its ability to provide fine-grained insights for sentiment expressions related to specific features of entities. An important component of aspect-based sentiment analysis is aspect extraction, which involves identifying and extracting aspect terms from text. Effective aspect extraction serves as the foundation for accurate sentiment analysis at the aspect level. In this paper, we propose aspect extraction models that use different types of embeddings for words and part-of-speech tags and that combine several learning models. We also propose tree positional encoding that is based on dependency parsing output to capture better the aspect positions in sentences. In addition, a new aspect extraction dataset is built for Turkish by machine translating an English dataset in a controlled setting. The experiments conducted on two Turkish datasets showed that the proposed models mostly outperform the studies that use the same datasets, and incorporating tree positional encoding increases the performance of the models.

Paper Structure

This paper contains 11 sections, 5 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: System architecture with BERT, BiLSTM, and CRF models
  • Figure 2: Dependency parse tree of the sentence "Son olarak dün gittiğim her gittiğimde ayrı keyif aldığım güzel mekan."