Table of Contents
Fetching ...

Bot Meets Shortcut: How Can LLMs Aid in Handling Unknown Invariance OOD Scenarios?

Shiyan Zheng, Herun Wan, Minnan Luo, Junhang Huang

TL;DR

An in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots, and proposes mitigation strategies based on large language models, leveraging counterfactual data augmentation.

Abstract

While existing social bot detectors perform well on benchmarks, their robustness across diverse real-world scenarios remains limited due to unclear ground truth and varied misleading cues. In particular, the impact of shortcut learning, where models rely on spurious correlations instead of capturing causal task-relevant features, has received limited attention. To address this gap, we conduct an in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots. We design a series of shortcut scenarios by constructing spurious associations between user labels and superficial textual cues to evaluate model robustness. Results show that shifts in irrelevant feature distributions significantly degrade social bot detector performance, with an average relative accuracy drop of 32\% in the baseline models. To tackle this challenge, we propose mitigation strategies based on large language models, leveraging counterfactual data augmentation. These methods mitigate the problem from data and model perspectives across three levels, including data distribution at both the individual user text and overall dataset levels, as well as the model's ability to extract causal information. Our strategies achieve an average relative performance improvement of 56\% under shortcut scenarios.

Bot Meets Shortcut: How Can LLMs Aid in Handling Unknown Invariance OOD Scenarios?

TL;DR

An in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots, and proposes mitigation strategies based on large language models, leveraging counterfactual data augmentation.

Abstract

While existing social bot detectors perform well on benchmarks, their robustness across diverse real-world scenarios remains limited due to unclear ground truth and varied misleading cues. In particular, the impact of shortcut learning, where models rely on spurious correlations instead of capturing causal task-relevant features, has received limited attention. To address this gap, we conduct an in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots. We design a series of shortcut scenarios by constructing spurious associations between user labels and superficial textual cues to evaluate model robustness. Results show that shifts in irrelevant feature distributions significantly degrade social bot detector performance, with an average relative accuracy drop of 32\% in the baseline models. To tackle this challenge, we propose mitigation strategies based on large language models, leveraging counterfactual data augmentation. These methods mitigate the problem from data and model perspectives across three levels, including data distribution at both the individual user text and overall dataset levels, as well as the model's ability to extract causal information. Our strategies achieve an average relative performance improvement of 56\% under shortcut scenarios.

Paper Structure

This paper contains 31 sections, 11 equations, 10 figures, 10 tables, 2 algorithms.

Figures (10)

  • Figure 1: Schematic illustration of the shortcut scenario. As shown on the left, the causal graph depicts how spurious features (i.e., shortcuts) may interfere with inference, leading the model to learn incorrect reasoning from the training set. For instance, on the right, the users are partitioned after associating task-irrelevant feature (e.g., sentiment) with their labels. As a result, the detector fails to generalize, and tends to make incorrect predictions when evaluated on diverse test instances.
  • Figure 2: A diagram of our shortcut settings. We focus on the superficial features of the text, such as sentiment, topic, emotion, and human values, and set shortcuts to these features in the training set. In the test set, we either reverse the pseudo-correlation between features and the label in the shortcut test set or eliminate these shortcuts in the standard test set.
  • Figure 3: Overview of our shortcut learning mitigation framework. The left side illustrates how LLMs are used to augment data by rewriting text from different attribute perspectives, while preserving the user’s label-related semantics. The right part demonstrates the mitigation process at three levels: balancing the semantic content of individual users’ texts at the user level, balancing feature distributions across classes at the dataset level, and enhancing the feature extractor's ability to capture causal information across different shortcut shifts at the language model embedding level by employing contrastive learning.
  • Figure 4: The topic distribution across different tags in the Cresci-2017-Data. There is no significant difference between human and bot distributions, indicating that topic features should not be exploited as cues by detectors.
  • Figure 5: Calibration of detectors in standard and shortcut settings. The sub-caption "Train $xy$" indicates that the model is trained under setting $x$ and tested under setting $y$, where 0 corresponds to the shortcut setting and 1 to the standard setting. The results demonstrate that models trained in the shortcut setting tend to exhibit higher confidence in their predictions, yet suffer from reduced accuracy.
  • ...and 5 more figures