Table of Contents
Fetching ...

Urban Safety Perception Through the Lens of Large Multimodal Models: A Persona-based Approach

Ciro Beneduce, Bruno Lepri, Massimiliano Luca

TL;DR

This paper investigates scalable assessment of urban safety perception by applying the Llava-1.6 7B large multimodal model to Place Pulse 2.0 street-view images in a zero-shot setting. It introduces Persona-based prompts to simulate socio-demographic perspectives and evaluates how nationality, gender, and age shape safety classifications, revealing notable biases and context sensitivity. The study finds an average F1 of 59.21% without fine-tuning, identifies three primary drivers of unsafety (isolation, decay, infrastructural challenges), and demonstrates theoretical alignment with urban criminology concepts like Broken Windows and Eyes on the Street. It also documents substantial cross-cultural and demographic variation in perceptions, underscores ethical considerations for bias and deployment, and provides a rich set of analyses (rank correlations, hierarchical clustering, keyword networks) to inform AI-assisted urban planning with careful prompt design.

Abstract

Understanding how urban environments are perceived in terms of safety is crucial for urban planning and policymaking. Traditional methods like surveys are limited by high cost, required time, and scalability issues. To overcome these challenges, this study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces using street-view images. In addition, the research investigated how this task is affected by different socio-demographic perspectives, simulated by the model through Persona-based prompts. Without additional fine-tuning, the model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe, identifying three key drivers of perceived unsafety: isolation, physical decay, and urban infrastructural challenges. Moreover, incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality. Elder and female Personas consistently perceive higher levels of unsafety than younger or male Personas. Similarly, nationality-specific differences were evident in the proportion of unsafe classifications ranging from 19.71% in Singapore to 40.15% in Botswana. Notably, the model's default configuration aligned most closely with a middle-aged, male Persona. These findings highlight the potential of LMMs as a scalable and cost-effective alternative to traditional methods for urban safety perceptions. While the sensitivity of these models to socio-demographic factors underscores the need for thoughtful deployment, their ability to provide nuanced perspectives makes them a promising tool for AI-driven urban planning.

Urban Safety Perception Through the Lens of Large Multimodal Models: A Persona-based Approach

TL;DR

This paper investigates scalable assessment of urban safety perception by applying the Llava-1.6 7B large multimodal model to Place Pulse 2.0 street-view images in a zero-shot setting. It introduces Persona-based prompts to simulate socio-demographic perspectives and evaluates how nationality, gender, and age shape safety classifications, revealing notable biases and context sensitivity. The study finds an average F1 of 59.21% without fine-tuning, identifies three primary drivers of unsafety (isolation, decay, infrastructural challenges), and demonstrates theoretical alignment with urban criminology concepts like Broken Windows and Eyes on the Street. It also documents substantial cross-cultural and demographic variation in perceptions, underscores ethical considerations for bias and deployment, and provides a rich set of analyses (rank correlations, hierarchical clustering, keyword networks) to inform AI-assisted urban planning with careful prompt design.

Abstract

Understanding how urban environments are perceived in terms of safety is crucial for urban planning and policymaking. Traditional methods like surveys are limited by high cost, required time, and scalability issues. To overcome these challenges, this study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces using street-view images. In addition, the research investigated how this task is affected by different socio-demographic perspectives, simulated by the model through Persona-based prompts. Without additional fine-tuning, the model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe, identifying three key drivers of perceived unsafety: isolation, physical decay, and urban infrastructural challenges. Moreover, incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality. Elder and female Personas consistently perceive higher levels of unsafety than younger or male Personas. Similarly, nationality-specific differences were evident in the proportion of unsafe classifications ranging from 19.71% in Singapore to 40.15% in Botswana. Notably, the model's default configuration aligned most closely with a middle-aged, male Persona. These findings highlight the potential of LMMs as a scalable and cost-effective alternative to traditional methods for urban safety perceptions. While the sensitivity of these models to socio-demographic factors underscores the need for thoughtful deployment, their ability to provide nuanced perspectives makes them a promising tool for AI-driven urban planning.

Paper Structure

This paper contains 30 sections, 18 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Panel A compares the neutral prompt (black text) with the socio-demographic prompt (red highlights), specifically focusing on a female Persona. The comparison showcases differences in classification, keywords, and reasoning for the same image. Panel B aggregates these differences to a city-level analysis in Washington, D.C., visualizing how classifications shift across demographics. The bar chart highlights variations in safety perception across genders, age groups, and nationalities, emphasizing the disparities in outcomes for Safe versus Unsafe classifications.
  • Figure 2: Co-occurrence networks of safe and unsafe keywords: The figure displays the 25 most frequently used keywords and their relationships for images classified as Safe (left) and Unsafe (right). Nodes represent keywords, with their size reflecting degree centrality, while edges represent co-occurrence relationships, with thickness corresponding to co-occurrence frequency. Distinct colours indicate keyword communities, highlighting the factors influencing each classification.
  • Figure 3: Multi-panel visualization analyzing unsafe image classifications, keyword frequency by gender, and contextual reasoning based on age-based Personas. (A) Unsafe Percentage by Nation: A horizontal bar plot showing the percentage of images classified as unsafe across nationality prompts, with error bars representing variability across runs. (B) Keyword Frequency Comparison: A grouped bar chart illustrating gender-based differences in keyword associations, such as higher frequencies of Isolated and Deserted for female prompts, and Residential and Well-maintained for male prompts. (C) Age-Prompt-Based Classification of the Same Image: this panel demonstrates how the same image (top) is classified differently based on three distinct age-based prompts. The classifications range from Safe for younger and middle-aged prompts to Unsafe for elder prompts, demonstrating the model’s sensitivity to the Persona age. The keywords and reasoning highlight the model's reliance on environmental factors such as pedestrian activity, urban maintenance, and surveillance presence. For instance, younger Personas emphasize accessibility and human activity, while elder Personas are influenced by isolation and the lack of visible security measures, resulting in differing perceptions of safety.
  • Figure 4: Clustering of 32 nationalities into seven distinct groups based on their unsafe classification percentages and standard deviations across cities, reflecting macro-regional and cultural similarities in urban safety perceptions according to Llava 1.6 7B. Cluster 1 (Taiwan, Canada, Finland, Hong Kong, Ireland) shows consistently low unsafe classifications and minimal variability, while Cluster 6 (Mexico, France, South Africa and Botswana) reveals higher unsafe classifications and greater variability. Cluster 4 (European countries and Australia) demonstrates moderate unsafe classifications along with Cluster 5 (Israel, Romania, Russia, Czech Republic, Brazil, Ukraine), which is perceived as slightly unsafer. Notably, both Singapore and United States formed their own clusters, respectively Cluster 2 and 7.
  • Figure 5: Multi-panel visualization analyzing the impact of different demographic prompts (nationality, gender, and age) on unsafe classifications and accuracy compared to a neutral baseline. (A) Nationality-Based Unsafe Classification Differences: A bar chart comparing the difference in unsafe classification rates between various nationality prompts and a neutral baseline. Negative values (green bars) indicate lower unsafe classification rates than neutral, while positive values (orange bars) indicate higher unsafe classification rates, suggesting that certain nationality prompts lead to greater perceived unsafety. (B) Accuracy with Neutral Prompt: A grouped bar chart displaying the classification accuracy for gender and age prompts when compared to a neutral reference. Male and Middle-Aged prompts exhibit the highest accuracy, while Elder prompts show the lowest accuracy, suggesting potential disparities in model robustness across demographic attributes. (C) Age- and Gender-Based Unsafe Classification Distributions: Boxplots showing the distribution of unsafe classification differences for gender (top) and age (bottom) prompts. Female prompts tend to have a higher median unsafe classification difference than male prompts. Similarly, the Elder age group exhibits the widest range and highest median unsafe classification differences.
  • ...and 3 more figures