Invisible Walls in Cities: Designing LLM Agent to Predict Urban Segregation Experience with Social Media Content
Bingbing Fan, Lin Chen, Songwei Li, Jian Yuan, Fengli Xu, Pan Hui, Yong Li
TL;DR
This work addresses predicting experienced urban segregation from social media by introducing a reflective LLM coder to extract a nine-dimension codebook of segregation drivers and a RE'EM framework that fuses reasoning, embeddings, and population signals through a neighbor-aware multi-view predictor. The approach achieves substantial predictive gains (e.g., $R^2$ up to 0.389 and MSE reductions) and generalizes across multiple cities, while also delivering codebook-guided summaries that enhance human understanding of POI inclusiveness. It demonstrates the value of structured, interpretable AI pipelines for social good, enabling policymakers and researchers to identify and address implicit barriers in urban environments. The work also provides a robust evaluation via a qualitative user study and quantitative experiments, and discusses ethical considerations and limitations with avenues for future temporal modeling and bias mitigation.
Abstract
Understanding experienced segregation in urban daily life is crucial for addressing societal inequalities and fostering inclusivity. The abundance of user-generated reviews on social media encapsulates nuanced perceptions and feelings associated with different places, offering rich insights into segregation. However, leveraging this data poses significant challenges due to its vast volume, ambiguity, and confluence of diverse perspectives. To tackle these challenges, we propose a novel Large Language Model (LLM) agent to automate online review mining for segregation prediction. Specifically, we propose a reflective LLM coder to digest social media content into insights consistent with real-world feedback, and eventually produce a codebook capturing key dimensions that signal segregation experience, such as cultural resonance and appeal, accessibility and convenience, and community engagement and local involvement. Guided by the codebook, LLMs can generate both informative review summaries and ratings for segregation prediction. Moreover, we design a REasoning-and-EMbedding (RE'EM) framework, which combines the reasoning and embedding capabilities of language models to integrate multi-channel features for segregation prediction. Experiments on real-world data demonstrate that our agent substantially improves prediction accuracy, with a 22.79% elevation in R$^{2}$ and a 9.33% reduction in MSE. The derived codebook is generalizable across three different cities, consistently improving prediction accuracy. Moreover, our user study confirms that the codebook-guided summaries provide cognitive gains for human participants in perceiving places of interest (POIs)' social inclusiveness. Our study marks an important step toward understanding implicit social barriers and inequalities, demonstrating the great potential of promoting social inclusiveness with Web technology.
