Table of Contents
Fetching ...

Emotion Classification In-Context in Spanish

Bipul Thapa, Gabriel Cofre

TL;DR

The paper tackles emotion classification of Spanish customer feedback and highlights the limitations of translation-based approaches that can erode semantic nuance. It introduces a hybrid feature representation combining TF-IDF and BERT embeddings with a Custom Stacking Ensemble (CSE) to classify emotions into positive, neutral, and negative, achieving a peak of 93.3% accuracy on native Spanish data and surpassing translated-data performance. The study uses a healthcare-related Spanish dataset from GreHus, compares native Spanish vs English-translated data, and demonstrates that the CSE framework outperforms individual models and transformer baselines while remaining computationally efficient. These findings have practical implications for businesses seeking accurate, language-preserving emotion analysis of customer feedback in Spanish.

Abstract

Classifying customer feedback into distinct emotion categories is essential for understanding sentiment and improving customer experience. In this paper, we classify customer feedback in Spanish into three emotion categories--positive, neutral, and negative--using advanced NLP and ML techniques. Traditional methods translate feedback from widely spoken languages to less common ones, resulting in a loss of semantic integrity and contextual nuances inherent to the original language. To address this limitation, we propose a hybrid approach that combines TF-IDF with BERT embeddings, effectively transforming Spanish text into rich numerical representations that preserve the semantic depth of the original language by using a Custom Stacking Ensemble (CSE) approach. To evaluate emotion classification, we utilize a range of models, including Logistic Regression, KNN, Bagging classifier with LGBM, and AdaBoost. The CSE model combines these classifiers as base models and uses a one-vs-all Logistic Regression as the meta-model. Our experimental results demonstrate that CSE significantly outperforms the individual and BERT model, achieving a test accuracy of 93.3% on the native Spanish dataset--higher than the accuracy obtained from the translated version. These findings underscore the challenges of emotion classification in Spanish and highlight the advantages of combining vectorization techniques like TF-IDF with BERT for improved accuracy. Our results provide valuable insights for businesses seeking to leverage emotion classification to enhance customer feedback analysis and service improvements.

Emotion Classification In-Context in Spanish

TL;DR

The paper tackles emotion classification of Spanish customer feedback and highlights the limitations of translation-based approaches that can erode semantic nuance. It introduces a hybrid feature representation combining TF-IDF and BERT embeddings with a Custom Stacking Ensemble (CSE) to classify emotions into positive, neutral, and negative, achieving a peak of 93.3% accuracy on native Spanish data and surpassing translated-data performance. The study uses a healthcare-related Spanish dataset from GreHus, compares native Spanish vs English-translated data, and demonstrates that the CSE framework outperforms individual models and transformer baselines while remaining computationally efficient. These findings have practical implications for businesses seeking accurate, language-preserving emotion analysis of customer feedback in Spanish.

Abstract

Classifying customer feedback into distinct emotion categories is essential for understanding sentiment and improving customer experience. In this paper, we classify customer feedback in Spanish into three emotion categories--positive, neutral, and negative--using advanced NLP and ML techniques. Traditional methods translate feedback from widely spoken languages to less common ones, resulting in a loss of semantic integrity and contextual nuances inherent to the original language. To address this limitation, we propose a hybrid approach that combines TF-IDF with BERT embeddings, effectively transforming Spanish text into rich numerical representations that preserve the semantic depth of the original language by using a Custom Stacking Ensemble (CSE) approach. To evaluate emotion classification, we utilize a range of models, including Logistic Regression, KNN, Bagging classifier with LGBM, and AdaBoost. The CSE model combines these classifiers as base models and uses a one-vs-all Logistic Regression as the meta-model. Our experimental results demonstrate that CSE significantly outperforms the individual and BERT model, achieving a test accuracy of 93.3% on the native Spanish dataset--higher than the accuracy obtained from the translated version. These findings underscore the challenges of emotion classification in Spanish and highlight the advantages of combining vectorization techniques like TF-IDF with BERT for improved accuracy. Our results provide valuable insights for businesses seeking to leverage emotion classification to enhance customer feedback analysis and service improvements.

Paper Structure

This paper contains 19 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Overview of the method
  • Figure 2: CSE approach
  • Figure 3: Model Accuracy Comparison for Spanish and English-translated Datasets
  • Figure 4: History plots for BERT model on Spanish and English datasets.