A Human Word Association based model for topic detection in social networks
Mehrdad Ranjbar Khadivi, Shahin Akbarpour, Mohammad-Reza Feizi-Derakhshi, Babak Anari
TL;DR
This work tackles topic detection in social networks by leveraging linguistic structure through Human Word Association (HWA), CIMAWA, and Association Gravity Force (AGF). It develops a five-step framework that ranks keywords, computes co-occurrence and asymmetric word associations, extracts patterns with AGF, and clusters patterns via pattern embeddings with HDBSCAN to produce topics. The approach is evaluated on English FA-Cup and Persian Telegram datasets, showing improvements in topic-recall and keyword-F1 over baselines and demonstrating cross-language applicability. The results suggest strong potential for language-agnostic topic detection in short, noisy social-media text, with future work aimed at graph-based representations and integration with large language models.
Abstract
With the widespread use of social networks, detecting the topics discussed on these platforms has become a significant challenge. Current approaches primarily rely on frequent pattern mining or semantic relations, often neglecting the structure of the language. Language structural methods aim to discover the relationships between words and how humans understand them. Therefore, this paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association. This framework employs the Human Word Association method and includes a specially designed extraction algorithm. The performance of this method is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection. The results indicate that the proposed method significantly improves topic detection compared to other methods, as evidenced by Topic-recall and the keyword F1 measure. Additionally, to assess the applicability and generalizability of the proposed method, a dataset of Telegram posts in the Persian language is used. The results demonstrate that this method outperforms other topic detection methods.
