Table of Contents
Fetching ...

Invisible Walls in Cities: Designing LLM Agent to Predict Urban Segregation Experience with Social Media Content

Bingbing Fan, Lin Chen, Songwei Li, Jian Yuan, Fengli Xu, Pan Hui, Yong Li

TL;DR

This work addresses predicting experienced urban segregation from social media by introducing a reflective LLM coder to extract a nine-dimension codebook of segregation drivers and a RE'EM framework that fuses reasoning, embeddings, and population signals through a neighbor-aware multi-view predictor. The approach achieves substantial predictive gains (e.g., $R^2$ up to 0.389 and MSE reductions) and generalizes across multiple cities, while also delivering codebook-guided summaries that enhance human understanding of POI inclusiveness. It demonstrates the value of structured, interpretable AI pipelines for social good, enabling policymakers and researchers to identify and address implicit barriers in urban environments. The work also provides a robust evaluation via a qualitative user study and quantitative experiments, and discusses ethical considerations and limitations with avenues for future temporal modeling and bias mitigation.

Abstract

Understanding experienced segregation in urban daily life is crucial for addressing societal inequalities and fostering inclusivity. The abundance of user-generated reviews on social media encapsulates nuanced perceptions and feelings associated with different places, offering rich insights into segregation. However, leveraging this data poses significant challenges due to its vast volume, ambiguity, and confluence of diverse perspectives. To tackle these challenges, we propose a novel Large Language Model (LLM) agent to automate online review mining for segregation prediction. Specifically, we propose a reflective LLM coder to digest social media content into insights consistent with real-world feedback, and eventually produce a codebook capturing key dimensions that signal segregation experience, such as cultural resonance and appeal, accessibility and convenience, and community engagement and local involvement. Guided by the codebook, LLMs can generate both informative review summaries and ratings for segregation prediction. Moreover, we design a REasoning-and-EMbedding (RE'EM) framework, which combines the reasoning and embedding capabilities of language models to integrate multi-channel features for segregation prediction. Experiments on real-world data demonstrate that our agent substantially improves prediction accuracy, with a 22.79% elevation in R$^{2}$ and a 9.33% reduction in MSE. The derived codebook is generalizable across three different cities, consistently improving prediction accuracy. Moreover, our user study confirms that the codebook-guided summaries provide cognitive gains for human participants in perceiving places of interest (POIs)' social inclusiveness. Our study marks an important step toward understanding implicit social barriers and inequalities, demonstrating the great potential of promoting social inclusiveness with Web technology.

Invisible Walls in Cities: Designing LLM Agent to Predict Urban Segregation Experience with Social Media Content

TL;DR

This work addresses predicting experienced urban segregation from social media by introducing a reflective LLM coder to extract a nine-dimension codebook of segregation drivers and a RE'EM framework that fuses reasoning, embeddings, and population signals through a neighbor-aware multi-view predictor. The approach achieves substantial predictive gains (e.g., up to 0.389 and MSE reductions) and generalizes across multiple cities, while also delivering codebook-guided summaries that enhance human understanding of POI inclusiveness. It demonstrates the value of structured, interpretable AI pipelines for social good, enabling policymakers and researchers to identify and address implicit barriers in urban environments. The work also provides a robust evaluation via a qualitative user study and quantitative experiments, and discusses ethical considerations and limitations with avenues for future temporal modeling and bias mitigation.

Abstract

Understanding experienced segregation in urban daily life is crucial for addressing societal inequalities and fostering inclusivity. The abundance of user-generated reviews on social media encapsulates nuanced perceptions and feelings associated with different places, offering rich insights into segregation. However, leveraging this data poses significant challenges due to its vast volume, ambiguity, and confluence of diverse perspectives. To tackle these challenges, we propose a novel Large Language Model (LLM) agent to automate online review mining for segregation prediction. Specifically, we propose a reflective LLM coder to digest social media content into insights consistent with real-world feedback, and eventually produce a codebook capturing key dimensions that signal segregation experience, such as cultural resonance and appeal, accessibility and convenience, and community engagement and local involvement. Guided by the codebook, LLMs can generate both informative review summaries and ratings for segregation prediction. Moreover, we design a REasoning-and-EMbedding (RE'EM) framework, which combines the reasoning and embedding capabilities of language models to integrate multi-channel features for segregation prediction. Experiments on real-world data demonstrate that our agent substantially improves prediction accuracy, with a 22.79% elevation in R and a 9.33% reduction in MSE. The derived codebook is generalizable across three different cities, consistently improving prediction accuracy. Moreover, our user study confirms that the codebook-guided summaries provide cognitive gains for human participants in perceiving places of interest (POIs)' social inclusiveness. Our study marks an important step toward understanding implicit social barriers and inequalities, demonstrating the great potential of promoting social inclusiveness with Web technology.

Paper Structure

This paper contains 25 sections, 8 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the reflective LLM coder. It consists of a reflective attributor that integrates multi-source review and image signals and refines insights using real visitation patterns, and a code summarizer that consolidates these insights into a structured, generalizable codebook capturing key factors shaping experienced segregation.
  • Figure 2: Reasoning-and-Embedding (RE'EM) framework. RE'EM integrates three complementary channels (reasoning, embedding, and population) with a neighbor-aware multi-view fusion to predict POIs' experienced segregation.
  • Figure 3: Human prediction accuracy with different information availability. Accuracy remains near random (50%) when only POI metadata (Baseline Estimation) or raw reviews (Review-informed Prediction) are available, but increases substantially once participants receive codebook-guided LLM summaries (Summary-enhanced Prediction), demonstrating the cognitive benefit of structured review distillation.
  • Figure 4: User preference between codebook-guided summary and vanilla summary. Across all sampled POIs, a strong majority of participants favor codebook-guided summaries, indicating that structured summaries better support human understanding of POI inclusiveness.
  • Figure 5: The Coefficient of Variation distributions for POIs features within the same CBG highlight considerable variability among POIs. Figures a-c demonstrate the distributions of stars, price range, and racial segregation, respectively.
  • ...and 2 more figures