Aspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User Reviews
Hamidreza Kazemi Taskooh, Taha Zare Harofte
TL;DR
This paper tackles the lack of Persian ABSA for tourism by introducing a three-stage BERT–MoE framework with Top-K routing to efficiently extract sentiment at the aspect level across six tourism-related categories. It leverages a large, labeled Persian dataset from Jabama (58,473 reviews, with 9,558 used for initial BERT training) and achieves a weighted ABSA F1 of $90.6\%$, outperforming a dense BERT baseline ($89.25\%$) and a BERT+MoE+LoRA variant ($85.7\%$). The model employs a loss formulation combining cross-entropy, auxiliary load-balancing, and MSE regularization, with routing enhanced by Intra-GPU and Fill-in rectification to reduce routing collapse ($\text{COV}^2$ down to $0.0109$) and to achieve a $39\%$ reduction in GPU power consumption. This work provides a scalable, energy-efficient approach suitable for deployment on real-time tourism platforms and contributes an open Persian ABSA dataset to spur further multilingual NLP research in tourism. The results demonstrate strong ABSA performance in a low-resource language while addressing practical constraints like energy usage and model scalability.
Abstract
This study advances aspect-based sentiment analysis (ABSA) for Persian-language user reviews in the tourism domain, addressing challenges of low-resource languages. We propose a hybrid BERT-based model with Top-K routing and auxiliary losses to mitigate routing collapse and improve efficiency. The pipeline includes: (1) overall sentiment classification using BERT on 9,558 labeled reviews, (2) multi-label aspect extraction for six tourism-related aspects (host, price, location, amenities, cleanliness, connectivity), and (3) integrated ABSA with dynamic routing. The dataset consists of 58,473 preprocessed reviews from the Iranian accommodation platform Jabama, manually annotated for aspects and sentiments. The proposed model achieves a weighted F1-score of 90.6% for ABSA, outperforming baseline BERT (89.25%) and a standard hybrid approach (85.7%). Key efficiency gains include a 39% reduction in GPU power consumption compared to dense BERT, supporting sustainable AI deployment in alignment with UN SDGs 9 and 12. Analysis reveals high mention rates for cleanliness and amenities as critical aspects. This is the first ABSA study focused on Persian tourism reviews, and we release the annotated dataset to facilitate future multilingual NLP research in tourism.
