Table of Contents
Fetching ...

Deciphering Social Behaviour: a Novel Biological Approach For Social Users Classification

Edoardo Allegrini, Edoardo Di Paolo, Marinella Petrocchi, Angelo Spognardi

TL;DR

This work tackles detecting social bots, including those powered by Large Language Models, by proposing a biology-inspired framework that represents user behavior as digital DNA and analyzes DNA similarity rather than exact pattern matches. It clusters accounts into macro species via Longest Common Substring analysis, seeds bot and genuine groups with a Pareto-based approach, and iteratively labels unlabeled species using global sequence alignment and a novel weightedLCS metric. The approach achieves high robustness, reporting $F1$ scores of $0.96$ on Cresci-17 and $0.83$ on fox-8 without any machine learning or deep learning, highlighting interpretability and applicability to evolving bot strategies. Overall, the method offers a transparent, rule-based alternative to ML-based detectors, with potential for multi-class expansion and improved clustering to better differentiate diverse bot breeds."

Abstract

Social media platforms continue to struggle with the growing presence of social bots-automated accounts that can influence public opinion and facilitate the spread of disinformation. Over time, these social bots have advanced significantly, making them increasingly difficult to distinguish from genuine users. Recently, new groups of bots have emerged, utilizing Large Language Models to generate content for posting, further complicating detection efforts. This paper proposes a novel approach that uses algorithms to measure the similarity between DNA strings, commonly used in biological contexts, to classify social users as bots or not. Our approach begins by clustering social media users into distinct macro species based on the similarities (and differences) observed in their timelines. These macro species are subsequently classified as either bots or genuine users, using a novel metric we developed that evaluates their behavioral characteristics in a way that mirrors biological comparison methods. This study extends beyond past approaches that focus solely on identical behaviors via analyses of the accounts' timelines. By incorporating new metrics, our approach systematically classifies non-trivial accounts into appropriate categories, effectively peeling back layers to reveal non-obvious species.

Deciphering Social Behaviour: a Novel Biological Approach For Social Users Classification

TL;DR

This work tackles detecting social bots, including those powered by Large Language Models, by proposing a biology-inspired framework that represents user behavior as digital DNA and analyzes DNA similarity rather than exact pattern matches. It clusters accounts into macro species via Longest Common Substring analysis, seeds bot and genuine groups with a Pareto-based approach, and iteratively labels unlabeled species using global sequence alignment and a novel weightedLCS metric. The approach achieves high robustness, reporting scores of on Cresci-17 and on fox-8 without any machine learning or deep learning, highlighting interpretability and applicability to evolving bot strategies. Overall, the method offers a transparent, rule-based alternative to ML-based detectors, with potential for multi-class expansion and improved clustering to better differentiate diverse bot breeds."

Abstract

Social media platforms continue to struggle with the growing presence of social bots-automated accounts that can influence public opinion and facilitate the spread of disinformation. Over time, these social bots have advanced significantly, making them increasingly difficult to distinguish from genuine users. Recently, new groups of bots have emerged, utilizing Large Language Models to generate content for posting, further complicating detection efforts. This paper proposes a novel approach that uses algorithms to measure the similarity between DNA strings, commonly used in biological contexts, to classify social users as bots or not. Our approach begins by clustering social media users into distinct macro species based on the similarities (and differences) observed in their timelines. These macro species are subsequently classified as either bots or genuine users, using a novel metric we developed that evaluates their behavioral characteristics in a way that mirrors biological comparison methods. This study extends beyond past approaches that focus solely on identical behaviors via analyses of the accounts' timelines. By incorporating new metrics, our approach systematically classifies non-trivial accounts into appropriate categories, effectively peeling back layers to reveal non-obvious species.

Paper Structure

This paper contains 18 sections, 8 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Scheme of the procedure for social users classification
  • Figure 2: Evaluation of mismatch/open gap/extend gap scores
  • Figure 3: An example of optimal alignment between two sequences
  • Figure 4: Unlabeled species classification
  • Figure 5: Anomaly in fox-8 vector $\mathbb V$
  • ...and 1 more figures