The Schwurbelarchiv: a German Language Telegram dataset for the Study of Conspiracy Theories
Mathias Angermaier, Elisabeth Hoeldrich, Jana Lasser, Joao Pinheiro Neto
TL;DR
The Schwurbelarchiv paper presents a large-scale, multimodal German Telegram dataset focused on conspiracy discourse, created via snowball sampling and complemented by transcriptions of audio content. It documents data collection from public chats, rigorous cleaning, and two network representations to enable content- and community-level analyses, while acknowledging ethical considerations and deletions that affect completeness. The work demonstrates substantial German-language dominance, regional diversity, and event-driven activity, with transcription expanding the textual corpus by about 41% and enabling multimodal inquiry into misinformation and online social dynamics. By providing descriptive statistics, validation steps, and FAIR-aligned appendix material, the authors offer a valuable resource for researchers in linguistics, political science, and information studies to study the spread and structure of conspiracy theories on Telegram at scale.
Abstract
Sociality borne by language, as is the predominant digital trace on text-based social media platforms, harbours the raw material for exploring multiple social phenomena. Distinctively, the messaging service Telegram provides functionalities that allow for socially interactive as well as one-to-many communication. Our Telegram dataset contains over 6,000 groups and channels, 40 million text messages, and over 3 million transcribed audio files, originating from a data-hoarding initiative named the ``Schwurbelarchiv'' (from German schwurbeln: speaking nonsense). This dataset publication details the structure, scope, and methodological specifics of the Schwurbelarchiv, emphasising its relevance for further research on the German-language conspiracy theory discourse. We validate its predominantly German origin by linguistic and temporal markers and situate it within the context of similar datasets. We describe process and extent of the transcription of multimedia files. Thanks to this effort the dataset uniquely supports multimodal analysis of online social dynamics and content dissemination. Researchers can employ this resource to explore societal dynamics in misinformation, political extremism, opinion adaptation, and social network structures on Telegram. The Schwurbelarchiv thus offers unprecedented opportunities for investigations into digital communication and its societal implications.
