Table of Contents
Fetching ...

pytopicgram: A library for data extraction and topic modeling from Telegram channels

J. Gómez-Romero, J. Cantón Correa, R. Pérez Mercado, F. Prados Abad, M. Molina-Solana, W. Fajardo

TL;DR

The paper addresses the challenge of analyzing public discourse on Telegram channels at scale. It introduces pytopicgram, an end-to-end Python library that combines Telegram crawling (via Telethon), data preprocessing, engagement metrics, and BERTopic-based topic extraction using contextual embeddings from Large Language Models. The work highlights an integrated, modular architecture with command-line operability, data minimization options, and optional OpenAI GPT-based topic descriptions, enabling scalable, unsupervised analysis without labeled data. The approach supports practical applications in tracking information diffusion, narrative evolution, and audience engagement while emphasizing local data processing, privacy, and potential policy-relevant insights for monitoring disinformation and online discourse.

Abstract

Telegram is a popular platform for public communication, generating large amounts of messages through its channels. pytopicgram is a Python library that helps researchers collect, organize, and analyze these Telegram messages. The library offers key features such as easy message retrieval, detailed channel information, engagement metrics, and topic identification using advanced modeling techniques. By simplifying data extraction and analysis, pytopicgram allows users to understand how content spreads and how audiences interact on Telegram. This paper describes the design, main features, and practical uses of \pytopicgram, showcasing its effectiveness for studying public conversations on Telegram.

pytopicgram: A library for data extraction and topic modeling from Telegram channels

TL;DR

The paper addresses the challenge of analyzing public discourse on Telegram channels at scale. It introduces pytopicgram, an end-to-end Python library that combines Telegram crawling (via Telethon), data preprocessing, engagement metrics, and BERTopic-based topic extraction using contextual embeddings from Large Language Models. The work highlights an integrated, modular architecture with command-line operability, data minimization options, and optional OpenAI GPT-based topic descriptions, enabling scalable, unsupervised analysis without labeled data. The approach supports practical applications in tracking information diffusion, narrative evolution, and audience engagement while emphasizing local data processing, privacy, and potential policy-relevant insights for monitoring disinformation and online discourse.

Abstract

Telegram is a popular platform for public communication, generating large amounts of messages through its channels. pytopicgram is a Python library that helps researchers collect, organize, and analyze these Telegram messages. The library offers key features such as easy message retrieval, detailed channel information, engagement metrics, and topic identification using advanced modeling techniques. By simplifying data extraction and analysis, pytopicgram allows users to understand how content spreads and how audiences interact on Telegram. This paper describes the design, main features, and practical uses of \pytopicgram, showcasing its effectiveness for studying public conversations on Telegram.

Paper Structure

This paper contains 9 sections, 1 figure.

Figures (1)

  • Figure 1: Architecture of pytopicgram