Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse
T. O. Abiola, K. D. Abiodun, O. E. Olumide, O. O. Adebanji, O. Hiram Calvo, Grigori Sidorov
TL;DR
The paper tackles multilingual multiclass detection of hopeful online discourse across English, Spanish, German, and Urdu by fine-tuning XLM-RoBERTa with a multilabel head and integrating active learning to handle class imbalance. It defines four classes (Not Hope, Generalized, Realistic, Unrealistic Hope) and evaluates on the PolyHope dataset, demonstrating that transformer-based multilingual models outperform logistic regression baselines and achieve competitive macro F1 scores across languages, including low-resource Urdu. Key contributions include a comprehensive methodology combining dataset distribution awareness, weighted loss, and iterative uncertainty-driven data selection, along with a comparative analysis against existing techniques. The work advances practical positive-content moderation by enabling robust, cross-lingual hope speech detection with fine-grained categorization, while acknowledging limitations in data diversity and computation that future work can address.
Abstract
The detection of hopeful speech in social media has emerged as a critical task for promoting positive discourse and well-being. In this paper, we present a machine learning approach to multiclass hope speech detection across multiple languages, including English, Urdu, and Spanish. We leverage transformer-based models, specifically XLM-RoBERTa, to detect and categorize hope speech into three distinct classes: Generalized Hope, Realistic Hope, and Unrealistic Hope. Our proposed methodology is evaluated on the PolyHope dataset for the PolyHope-M 2025 shared task, achieving competitive performance across all languages. We compare our results with existing models, demonstrating that our approach significantly outperforms prior state-of-the-art techniques in terms of macro F1 scores. We also discuss the challenges in detecting hope speech in low-resource languages and the potential for improving generalization. This work contributes to the development of multilingual, fine-grained hope speech detection models, which can be applied to enhance positive content moderation and foster supportive online communities.
