Urban Safety Perception Through the Lens of Large Multimodal Models: A Persona-based Approach
Ciro Beneduce, Bruno Lepri, Massimiliano Luca
TL;DR
This paper investigates scalable assessment of urban safety perception by applying the Llava-1.6 7B large multimodal model to Place Pulse 2.0 street-view images in a zero-shot setting. It introduces Persona-based prompts to simulate socio-demographic perspectives and evaluates how nationality, gender, and age shape safety classifications, revealing notable biases and context sensitivity. The study finds an average F1 of 59.21% without fine-tuning, identifies three primary drivers of unsafety (isolation, decay, infrastructural challenges), and demonstrates theoretical alignment with urban criminology concepts like Broken Windows and Eyes on the Street. It also documents substantial cross-cultural and demographic variation in perceptions, underscores ethical considerations for bias and deployment, and provides a rich set of analyses (rank correlations, hierarchical clustering, keyword networks) to inform AI-assisted urban planning with careful prompt design.
Abstract
Understanding how urban environments are perceived in terms of safety is crucial for urban planning and policymaking. Traditional methods like surveys are limited by high cost, required time, and scalability issues. To overcome these challenges, this study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces using street-view images. In addition, the research investigated how this task is affected by different socio-demographic perspectives, simulated by the model through Persona-based prompts. Without additional fine-tuning, the model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe, identifying three key drivers of perceived unsafety: isolation, physical decay, and urban infrastructural challenges. Moreover, incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality. Elder and female Personas consistently perceive higher levels of unsafety than younger or male Personas. Similarly, nationality-specific differences were evident in the proportion of unsafe classifications ranging from 19.71% in Singapore to 40.15% in Botswana. Notably, the model's default configuration aligned most closely with a middle-aged, male Persona. These findings highlight the potential of LMMs as a scalable and cost-effective alternative to traditional methods for urban safety perceptions. While the sensitivity of these models to socio-demographic factors underscores the need for thoughtful deployment, their ability to provide nuanced perspectives makes them a promising tool for AI-driven urban planning.
