Table of Contents
Fetching ...

A Human Word Association based model for topic detection in social networks

Mehrdad Ranjbar Khadivi, Shahin Akbarpour, Mohammad-Reza Feizi-Derakhshi, Babak Anari

TL;DR

This work tackles topic detection in social networks by leveraging linguistic structure through Human Word Association (HWA), CIMAWA, and Association Gravity Force (AGF). It develops a five-step framework that ranks keywords, computes co-occurrence and asymmetric word associations, extracts patterns with AGF, and clusters patterns via pattern embeddings with HDBSCAN to produce topics. The approach is evaluated on English FA-Cup and Persian Telegram datasets, showing improvements in topic-recall and keyword-F1 over baselines and demonstrating cross-language applicability. The results suggest strong potential for language-agnostic topic detection in short, noisy social-media text, with future work aimed at graph-based representations and integration with large language models.

Abstract

With the widespread use of social networks, detecting the topics discussed on these platforms has become a significant challenge. Current approaches primarily rely on frequent pattern mining or semantic relations, often neglecting the structure of the language. Language structural methods aim to discover the relationships between words and how humans understand them. Therefore, this paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association. This framework employs the Human Word Association method and includes a specially designed extraction algorithm. The performance of this method is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection. The results indicate that the proposed method significantly improves topic detection compared to other methods, as evidenced by Topic-recall and the keyword F1 measure. Additionally, to assess the applicability and generalizability of the proposed method, a dataset of Telegram posts in the Persian language is used. The results demonstrate that this method outperforms other topic detection methods.

A Human Word Association based model for topic detection in social networks

TL;DR

This work tackles topic detection in social networks by leveraging linguistic structure through Human Word Association (HWA), CIMAWA, and Association Gravity Force (AGF). It develops a five-step framework that ranks keywords, computes co-occurrence and asymmetric word associations, extracts patterns with AGF, and clusters patterns via pattern embeddings with HDBSCAN to produce topics. The approach is evaluated on English FA-Cup and Persian Telegram datasets, showing improvements in topic-recall and keyword-F1 over baselines and demonstrating cross-language applicability. The results suggest strong potential for language-agnostic topic detection in short, noisy social-media text, with future work aimed at graph-based representations and integration with large language models.

Abstract

With the widespread use of social networks, detecting the topics discussed on these platforms has become a significant challenge. Current approaches primarily rely on frequent pattern mining or semantic relations, often neglecting the structure of the language. Language structural methods aim to discover the relationships between words and how humans understand them. Therefore, this paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association. This framework employs the Human Word Association method and includes a specially designed extraction algorithm. The performance of this method is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection. The results indicate that the proposed method significantly improves topic detection compared to other methods, as evidenced by Topic-recall and the keyword F1 measure. Additionally, to assess the applicability and generalizability of the proposed method, a dataset of Telegram posts in the Persian language is used. The results demonstrate that this method outperforms other topic detection methods.
Paper Structure (26 sections, 13 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 13 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: The flowchart of the proposed framework. This framework has 5 steps. In the first step, after receiving the data stream, pre-processing operations such as windowing and tokenizer are performed. HWA values are calculated in the second step, and patterns are extracted using the presented algorithm. In the next step, the word vector of each pattern is extracted so that in step 4, each pattern's similarity(distance) is calculated and the patterns can be clustered. Finally, in the last step, the topics are selected and extracted based on the top-k ranking
  • Figure 2: The number of characters of posts in Telegram and Twitter. Each microblog post usually contains less than 300 characters. This chart illustrates that both Twitter and Telegram can be considered a microblog.
  • Figure 3: The word tag of topics in two windows 16 and 37 (A and B). Window 16 includes the super topic "Death of Ayatollah Hashemi" and window 37 includes the super topic "Plasko building fire incident". In addition, each window contains other topics. For example, in window 37, the topic "Inauguration of the President of the United States" can also be seen