An Aspect Extraction Framework using Different Embedding Types, Learning Models, and Dependency Structure
Ali Erkan, Tunga Güngör
TL;DR
This paper tackles fine-grained ABSA by focusing on Turkish aspect extraction using a hybrid neural framework that combines word and POS embeddings with a dependency-tree based positional encoding within a BiLSTM-CRF architecture, complemented by Viterbi decoding. It compares random/Word2Vec and Turkish BERT word representations across two Turkish ABSA datasets, showing that tree positional encoding and contextualized embeddings jointly boost performance. The method achieves about 75.7 F1 on the original Turkish SemEval dataset and 72.4 F1 on a machine-translated Turkish version, outperforming several baselines on Turkish data and validating the approach’s effectiveness. A new Turkish ABSA dataset and public code are released to support further research, and future work includes leveraging large language models and building an end-to-end Turkish ABSA system.
Abstract
Aspect-based sentiment analysis has gained significant attention in recent years due to its ability to provide fine-grained insights for sentiment expressions related to specific features of entities. An important component of aspect-based sentiment analysis is aspect extraction, which involves identifying and extracting aspect terms from text. Effective aspect extraction serves as the foundation for accurate sentiment analysis at the aspect level. In this paper, we propose aspect extraction models that use different types of embeddings for words and part-of-speech tags and that combine several learning models. We also propose tree positional encoding that is based on dependency parsing output to capture better the aspect positions in sentences. In addition, a new aspect extraction dataset is built for Turkish by machine translating an English dataset in a controlled setting. The experiments conducted on two Turkish datasets showed that the proposed models mostly outperform the studies that use the same datasets, and incorporating tree positional encoding increases the performance of the models.
