Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

Md Towhidul Absar Chowdhury; Soumyajit Datta; Naveen Sharma; Ashiqur R. KhudaBukhsh

Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

Md Towhidul Absar Chowdhury, Soumyajit Datta, Naveen Sharma, Ashiqur R. KhudaBukhsh

TL;DR

This work introduces infrastructure ombudsman, a novel task and system for automatically surfacing anticipatory infrastructure concerns from social media after structural failures. Building a dataset of 2,662 annotated instances from Reddit and YouTube through a multi-step pipeline (keyword filtering, NLI pruning, LLM-based annotation, crowdsourced and expert human labeling), the authors demonstrate that both zero-shot and supervised NLP models can detect future infrastructure concerns, with supervised RoBERTa and LLAMA2 variants achieving the strongest performance. The study highlights the rarity of actionable anticipatory signals in online discourse, evaluates model robustness with location masking, and validates practical utility through an in-the-wild evaluation showing high precision and recall. The work has implications for urban planning and disaster prevention by enabling automated routing of credible concerns to authorities, and it outlines avenues for extending the approach to other domains and incorporating active learning. Overall, the infrastructure ombudsman provides a scalable, evidence-based method to surface latent vulnerabilities in the built environment from public discussions.

Abstract

Current research concentrates on studying discussions on social media related to structural failures to improve disaster response strategies. However, detecting social web posts discussing concerns about anticipatory failures is under-explored. If such concerns are channeled to the appropriate authorities, it can aid in the prevention and mitigation of potential infrastructural failures. In this paper, we develop an infrastructure ombudsman -- that automatically detects specific infrastructure concerns. Our work considers several recent structural failures in the US. We present a first-of-its-kind dataset of 2,662 social web instances for this novel task mined from Reddit and YouTube.

Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

TL;DR

Abstract

Paper Structure (22 sections, 4 figures, 8 tables)

This paper contains 22 sections, 4 figures, 8 tables.

Introduction
The Needle and the Haystack
Needle
Haystack
Dataset
Platforms: Reddit and YouTube
Keyword Filtering
Harnessing LLMs and Machine Annotation
Textual Entailment
LLM Annotation
Human Annotation Process
Dataset Statistic
Infrastructure Ombudsman
Zero-Shot Classification
Supervised Classification
...and 7 more sections

Figures (4)

Figure 1: Dataset Creation Pipeline
Figure 2: Distribution of crowdsourced workers' answers to the question How much priority should infrastructure get in the United States?. Y-axis indicates percentage
Figure 3: A word cloud visualization of all the locations of positive classes in the corpus highlighting potential structural failures. We have removed mentions of the United States and the 50 states in the USA in order to highlight more specific locations. A word cloud with none of the states removed is available in the Appendix.
Figure 4: A word cloud visualization of all the locations of positive classes in the corpus highlighting potential structural failures.

Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

TL;DR

Abstract

Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

Authors

TL;DR

Abstract

Table of Contents

Figures (4)