Statistical Analysis of Risk Assessment Factors and Metrics to Evaluate Radicalisation in Twitter
Raul Lara-Cabrera, Antonio Gonzalez-Pardo, David Camacho
TL;DR
This study investigates automated risk assessment of radicalisation on Twitter by defining five indicators spanning personality, relationships, and attitudes/beliefs, and measuring them with keyword-based metrics enhanced by WordNet expansion and stemming. It evaluates these metrics across three datasets (ISIS-sympathizer tweets, Anonymous-language data, and random Twitter samples) to assess discriminatory power and language effects, finding that keyword-based signals can distinguish radicalised users but are heavily language-dependent and limited by English coverage. The work highlights that some metrics (notably ellipses) may be weak indicators, while frustration and sentiment-related metrics show stronger signals in radicalised data. It also outlines directions for future work, including language-agnostic ontologies, multilingual keyword processing, and incorporating network-structure features and ML classifiers to improve robustness and generalization.
Abstract
Nowadays, Social Networks have become an essential communication tools producing a large amount of information about their users and their interactions, which can be analysed with Data Mining methods. In the last years, Social Networks are being used to radicalise people. In this paper, we study the performance of a set of indicators and their respective metrics, devoted to assess the risk of radicalisation of a precise individual on three different datasets. Keyword-based metrics, even though depending on the written language, performs well when measuring frustration, perception of discrimination as well as declaration of negative and positive ideas about Western society and Jihadism, respectively. However, metrics based on frequent habits such as writing ellipses are not well enough to characterise a user in risk of radicalisation. The paper presents a detailed description of both, the set of indicators used to asses the radicalisation in Social Networks and the set of datasets used to evaluate them. Finally, an experimental study over these datasets are carried out to evaluate the performance of the metrics considered.
