Tackling a Challenging Corpus for Early Detection of Gambling Disorder: UNSL at MentalRiskES 2025
Horacio Thompson, Marcelo Errecalde
TL;DR
The paper tackles early risk detection of gambling disorder on the web by deploying a modular CPI+DMC framework with three distinct CPI models (SS3, Extended BETO with domain vocabulary, and SBERT with SetFit) and history- or global-based decision policies. It reports top placements in MentalRiskES 2025 Task 1 (Macro F1 around 0.56) and analyzes the trade-offs between predictive accuracy and decision speed, as well as model agreement and error patterns. The study highlights corpus challenges, such as high lexical overlap and nuanced risk signals, and discusses data interpretation, adaptive evaluation metrics, and the need for transparent ERD systems in mental health contexts. Overall, it demonstrates that balancing classification performance with timely decisions can yield strong results, while underscoring the importance of data quality and interpretability for real-world deployment.
Abstract
Gambling disorder is a complex behavioral addiction that is challenging to understand and address, with severe physical, psychological, and social consequences. Early Risk Detection (ERD) on the Web has become a key task in the scientific community for identifying early signs of mental health behaviors based on social media activity. This work presents our participation in the MentalRiskES 2025 challenge, specifically in Task 1, aimed at classifying users at high or low risk of developing a gambling-related disorder. We proposed three methods based on a CPI+DMC approach, addressing predictive effectiveness and decision-making speed as independent objectives. The components were implemented using the SS3, BERT with extended vocabulary, and SBERT models, followed by decision policies based on historical user analysis. Although it was a challenging corpus, two of our proposals achieved the top two positions in the official results, performing notably in decision metrics. Further analysis revealed some difficulty in distinguishing between users at high and low risk, reinforcing the need to explore strategies to improve data interpretation and quality, and to promote more transparent and reliable ERD systems for mental disorders.
