Posts of Peril: Detecting Information About Hazards in Text
Keith Burghardt, Daniel M. T. Fessler, Chyna Tang, Anne Pisor, Kristina Lerman
TL;DR
This work introduces a multilingual hazard detector trained on a newly annotated X-post corpus to identify information about hazards in social media text. It demonstrates that hazard signals are partially independent from negative affect indicators and shows the detector scales to millions of posts, outperforming simple dictionary baselines and approaching LLM baselines in accuracy while offering far higher throughput. The model is applied to 3.6M Israel-Hamas war posts and 5.9M French election posts to examine hazard usage in real-world information campaigns, revealing that coordinated actors often frame hazards to support weaker sides and influence civilian perceptions. The authors release hazard-annotated data and code as an open-source Python package, enabling researchers and journalists to analyze hazard content at scale and to study hazard-based information operations in multilingual, geopolitical contexts.
Abstract
Socio-linguistic indicators of affectively-relevant phenomena, such as emotion or sentiment, are often extracted from text to better understand features of human-computer interactions, including on social media. However, an indicator that is often overlooked is the presence or absence of information concerning harms or hazards. Here, we develop a new model to detect information concerning hazards, trained on a new collection of annotated X posts. We show that not only does this model perform well (outperforming, e.g., dictionary approaches), but that the hazard information it extracts is not strongly correlated with common indicators. To demonstrate the utility of our tool, we apply it to two datasets of X posts that discuss important geopolitical events, namely the Israel-Hamas war and the 2022 French national election. In both cases, we find that hazard information, especially information concerning conflict, is common. We extract accounts associated with information campaigns from each data set to explore how information about hazards could be used to attempt to influence geopolitical events. We find that inorganic accounts representing the viewpoints of weaker sides in a conflict often discuss hazards to civilians, potentially as a way to elicit aid for the weaker side. Moreover, the rate at which these hazards are mentioned differs markedly from organic accounts, likely reflecting information operators' efforts to frame the given geopolitical event for strategic purposes. These results are first steps towards exploring hazards within an information warfare environment. The model is shared as a Python package to help researchers and journalists analyze hazard content. The model, along with data and annotations, is available in the following repository: https://github.com/KeithBurghardt/DetectHazards.
