PromotionGo at SemEval-2025 Task 11: A Feature-Centric Framework for Cross-Lingual Multi-Emotion Detection in Short Texts
Ziyi Huang, Xia Cui
TL;DR
The paper tackles cross-lingual, multi-label emotion detection in short texts by proposing a feature-centric framework that jointly considers document representations, dimensionality reduction, and model training. It modularly combines traditional lexical features, pretrained embeddings, and transformer-based encodings, using PCA to balance performance and efficiency, and evaluates across 28 languages with a variety of classifiers including MLP. Key findings show TF-IDF can outperform more complex representations in low-resource languages, while Sentence-BERT with MLP often yields the best overall results; PCA generally reduces training time with varying impact on accuracy. The framework offers a scalable approach to multilingual emotion detection, providing insights into language-specific representation choices and efficiency trade-offs for real-world deployment.
Abstract
This paper presents our system for SemEval 2025 Task 11: Bridging the Gap in Text-Based Emotion Detection (Track A), which focuses on multi-label emotion detection in short texts. We propose a feature-centric framework that dynamically adapts document representations and learning algorithms to optimize language-specific performance. Our study evaluates three key components: document representation, dimensionality reduction, and model training in 28 languages, highlighting five for detailed analysis. The results show that TF-IDF remains highly effective for low-resource languages, while contextual embeddings like FastText and transformer-based document representations, such as those produced by Sentence-BERT, exhibit language-specific strengths. Principal Component Analysis (PCA) reduces training time without compromising performance, particularly benefiting FastText and neural models such as Multi-Layer Perceptrons (MLP). Computational efficiency analysis underscores the trade-off between model complexity and processing cost. Our framework provides a scalable solution for multilingual emotion detection, addressing the challenges of linguistic diversity and resource constraints.
