Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

Fatima Ashraf; Muhammad Ayub Sabir; Jiaxin Deng; Junbiao Pang; Haitao Yu

Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

Fatima Ashraf, Muhammad Ayub Sabir, Jiaxin Deng, Junbiao Pang, Haitao Yu

TL;DR

This work tackles weak signals in urban transit from noisy social media by introducing an influence-weighted keyword co-occurrence graph and a Poisson Deconvolution Factorization (PDF) to separate low-rank topical structure from residual interactions. The model jointly learns a nonnegative topic–word dictionary, topic strengths, and a sparse residual loading, with decorrelation to keep topics distinct and a lightweight optimization combining multiplicative updates and ADMM. Topic selection is guided by a coherence-driven sweep, yielding interpretable topics with strong NPMI and high diversity, validated on city-specific Weibo data. The approach provides actionable event keywords and a principled framework for real-time, structure-aware social signal analysis in intelligent transportation systems.

Abstract

Urban transit agencies increasingly turn to social media to monitor emerging service risks such as crowding, delays, and safety incidents, yet the signals of concern are sparse, short, and easily drowned by routine chatter. We address this challenge by jointly modeling linguistic interactions and user influence. First, we construct an influence-weighted keyword co-occurrence graph from cleaned posts so that socially impactful posts contributes proportionally to the underlying evidence. The core of our framework is a Poisson Deconvolution Factorization (PDF) that decomposes this graph into a low-rank topical structure and topic-localized residual interactions, producing an interpretable topic--keyword basis together with topic importance scores. A decorrelation regularizer \emph{promotes} distinct topics, and a lightweight optimization procedure ensures stable convergence under nonnegativity and normalization constraints. Finally, the number of topics is selected through a coherence-driven sweep that evaluates the quality and distinctness of the learned topics. On large-scale social streams, the proposed model achieves state-of-the-art topic coherence and strong diversity compared with leading baselines. The code and dataset are publicly available at https://github.com/pangjunbiao/Topic-Modeling_ITS.git

Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

TL;DR

Abstract

Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)

Theorems & Definitions (1)