Table of Contents
Fetching ...

Aspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User Reviews

Hamidreza Kazemi Taskooh, Taha Zare Harofte

TL;DR

This paper tackles the lack of Persian ABSA for tourism by introducing a three-stage BERT–MoE framework with Top-K routing to efficiently extract sentiment at the aspect level across six tourism-related categories. It leverages a large, labeled Persian dataset from Jabama (58,473 reviews, with 9,558 used for initial BERT training) and achieves a weighted ABSA F1 of $90.6\%$, outperforming a dense BERT baseline ($89.25\%$) and a BERT+MoE+LoRA variant ($85.7\%$). The model employs a loss formulation combining cross-entropy, auxiliary load-balancing, and MSE regularization, with routing enhanced by Intra-GPU and Fill-in rectification to reduce routing collapse ($\text{COV}^2$ down to $0.0109$) and to achieve a $39\%$ reduction in GPU power consumption. This work provides a scalable, energy-efficient approach suitable for deployment on real-time tourism platforms and contributes an open Persian ABSA dataset to spur further multilingual NLP research in tourism. The results demonstrate strong ABSA performance in a low-resource language while addressing practical constraints like energy usage and model scalability.

Abstract

This study advances aspect-based sentiment analysis (ABSA) for Persian-language user reviews in the tourism domain, addressing challenges of low-resource languages. We propose a hybrid BERT-based model with Top-K routing and auxiliary losses to mitigate routing collapse and improve efficiency. The pipeline includes: (1) overall sentiment classification using BERT on 9,558 labeled reviews, (2) multi-label aspect extraction for six tourism-related aspects (host, price, location, amenities, cleanliness, connectivity), and (3) integrated ABSA with dynamic routing. The dataset consists of 58,473 preprocessed reviews from the Iranian accommodation platform Jabama, manually annotated for aspects and sentiments. The proposed model achieves a weighted F1-score of 90.6% for ABSA, outperforming baseline BERT (89.25%) and a standard hybrid approach (85.7%). Key efficiency gains include a 39% reduction in GPU power consumption compared to dense BERT, supporting sustainable AI deployment in alignment with UN SDGs 9 and 12. Analysis reveals high mention rates for cleanliness and amenities as critical aspects. This is the first ABSA study focused on Persian tourism reviews, and we release the annotated dataset to facilitate future multilingual NLP research in tourism.

Aspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User Reviews

TL;DR

This paper tackles the lack of Persian ABSA for tourism by introducing a three-stage BERT–MoE framework with Top-K routing to efficiently extract sentiment at the aspect level across six tourism-related categories. It leverages a large, labeled Persian dataset from Jabama (58,473 reviews, with 9,558 used for initial BERT training) and achieves a weighted ABSA F1 of , outperforming a dense BERT baseline () and a BERT+MoE+LoRA variant (). The model employs a loss formulation combining cross-entropy, auxiliary load-balancing, and MSE regularization, with routing enhanced by Intra-GPU and Fill-in rectification to reduce routing collapse ( down to ) and to achieve a reduction in GPU power consumption. This work provides a scalable, energy-efficient approach suitable for deployment on real-time tourism platforms and contributes an open Persian ABSA dataset to spur further multilingual NLP research in tourism. The results demonstrate strong ABSA performance in a low-resource language while addressing practical constraints like energy usage and model scalability.

Abstract

This study advances aspect-based sentiment analysis (ABSA) for Persian-language user reviews in the tourism domain, addressing challenges of low-resource languages. We propose a hybrid BERT-based model with Top-K routing and auxiliary losses to mitigate routing collapse and improve efficiency. The pipeline includes: (1) overall sentiment classification using BERT on 9,558 labeled reviews, (2) multi-label aspect extraction for six tourism-related aspects (host, price, location, amenities, cleanliness, connectivity), and (3) integrated ABSA with dynamic routing. The dataset consists of 58,473 preprocessed reviews from the Iranian accommodation platform Jabama, manually annotated for aspects and sentiments. The proposed model achieves a weighted F1-score of 90.6% for ABSA, outperforming baseline BERT (89.25%) and a standard hybrid approach (85.7%). Key efficiency gains include a 39% reduction in GPU power consumption compared to dense BERT, supporting sustainable AI deployment in alignment with UN SDGs 9 and 12. Analysis reveals high mention rates for cleanliness and amenities as critical aspects. This is the first ABSA study focused on Persian tourism reviews, and we release the annotated dataset to facilitate future multilingual NLP research in tourism.
Paper Structure (27 sections, 7 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 7 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Workflow of data collection, preprocessing, model training and evaluation.
  • Figure 2: Distribution of 9,558 data points used for training the BERT base model.
  • Figure 6: Heatmap of gate-assigned expert weights across aspect types (before applying rectification techniques), illustrating emergent specialization.
  • Figure 7: Heatmap of gate-assigned expert weights across aspect types, illustrating improved specialization after Top-K routing implementation.
  • Figure 8: Validation performance metrics for the modified Top-K routing the hybrid expert-enhanced BERT model. From left to right: (a) Validation accuracy, showing stable convergence; (b) Validation precision, highlighting improved precision across sentiment classes; (c) Validation recall, illustrating robust performance despite class imbalance; (d) Weighted F1-score, reflecting superior overall performance.
  • ...and 4 more figures