Table of Contents
Fetching ...

Human vs. AI Safety Perception? Decoding Human Safety Perception with Eye-Tracking Systems, Street View Images, and Explainable AI

Yuhao Kang, Junda Chen, Liu Liu, Kshitij Sharmad, Martina Mazzarello, Simone Mora, Fabio Duarte, Carlo Ratti

TL;DR

This work addresses how environmental visuals shape human safety perceptions by integrating eye-tracking heatmaps with street-view imagery and eXplainable AI (XAI). It introduces a human-centered framework that uses Mean Object Ratios in Highlighted Regions ($$MoRH$$) and Mean Object Hue ($$MoH$$) to link gaze with urban features, and validates XAI explanations against real human attention across a Helsingborg dataset. The study finds that XGradCAM and EigenCAM most closely align with human safety perceptions in a Swedish context, and discusses implications for urban design, trustworthiness of AI explanations, and ethical considerations. Overall, the approach provides a more nuanced understanding of which built-environment elements drive safety perceptions and offers practical guidance for perception-informed urban planning.

Abstract

The way residents perceive safety plays an important role in how they use public spaces. Studies have combined large-scale street view images and advanced computer vision techniques to measure the perception of safety of urban environments. Despite their success, such studies have often overlooked the specific environmental visual factors that draw human attention and trigger people's feelings of safety perceptions. In this study, we introduce a computational framework that enriches the existing body of literature on place perception by using eye-tracking systems with street view images and deep learning approaches. Eye-tracking systems quantify not only what users are looking at but also how long they engage with specific environmental elements. This allows us to explore the nuance of which visual environmental factors influence human safety perceptions. We conducted our research in Helsingborg, Sweden, where we recruited volunteers outfitted with eye-tracking systems. They were asked to indicate which of the two street view images appeared safer. By examining participants' focus on specific features using Mean Object Ratio in Highlighted Regions (MoRH) and Mean Object Hue (MoH), we identified key visual elements that attract human attention when perceiving safe environments. For instance, certain urban infrastructure and public space features draw more human attention while the sky is less relevant in influencing safety perceptions. These insights offer a more human-centered understanding of which urban features influence human safety perceptions. Furthermore, we compared the real human attention from eye-tracking systems with attention maps obtained from eXplainable Artificial Intelligence (XAI) results. Several XAI models were tested, and we observed that XGradCAM and EigenCAM most closely align with human safety perceptual patterns.

Human vs. AI Safety Perception? Decoding Human Safety Perception with Eye-Tracking Systems, Street View Images, and Explainable AI

TL;DR

This work addresses how environmental visuals shape human safety perceptions by integrating eye-tracking heatmaps with street-view imagery and eXplainable AI (XAI). It introduces a human-centered framework that uses Mean Object Ratios in Highlighted Regions () and Mean Object Hue () to link gaze with urban features, and validates XAI explanations against real human attention across a Helsingborg dataset. The study finds that XGradCAM and EigenCAM most closely align with human safety perceptions in a Swedish context, and discusses implications for urban design, trustworthiness of AI explanations, and ethical considerations. Overall, the approach provides a more nuanced understanding of which built-environment elements drive safety perceptions and offers practical guidance for perception-informed urban planning.

Abstract

The way residents perceive safety plays an important role in how they use public spaces. Studies have combined large-scale street view images and advanced computer vision techniques to measure the perception of safety of urban environments. Despite their success, such studies have often overlooked the specific environmental visual factors that draw human attention and trigger people's feelings of safety perceptions. In this study, we introduce a computational framework that enriches the existing body of literature on place perception by using eye-tracking systems with street view images and deep learning approaches. Eye-tracking systems quantify not only what users are looking at but also how long they engage with specific environmental elements. This allows us to explore the nuance of which visual environmental factors influence human safety perceptions. We conducted our research in Helsingborg, Sweden, where we recruited volunteers outfitted with eye-tracking systems. They were asked to indicate which of the two street view images appeared safer. By examining participants' focus on specific features using Mean Object Ratio in Highlighted Regions (MoRH) and Mean Object Hue (MoH), we identified key visual elements that attract human attention when perceiving safe environments. For instance, certain urban infrastructure and public space features draw more human attention while the sky is less relevant in influencing safety perceptions. These insights offer a more human-centered understanding of which urban features influence human safety perceptions. Furthermore, we compared the real human attention from eye-tracking systems with attention maps obtained from eXplainable Artificial Intelligence (XAI) results. Several XAI models were tested, and we observed that XGradCAM and EigenCAM most closely align with human safety perceptual patterns.

Paper Structure

This paper contains 26 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Conceptual framework of this study. We start from collecting street view images to measure perceived safety. Participants complete a survey and equip eye-tracking systems to produce human attention heatmaps. We also run deep learning-based models to predict safety scores with Explainable AI (XAI) approaches to generate XAI-based heatmaps. By comparing the two results, we aim to deepen our understanding of urban safe environments to inform decision-making for urban planning and design.
  • Figure 2: A user interface screenshot of the survey.
  • Figure 3: Top 10 objects based on Mean Object Ratio (MoR) in the two groups of street view images.
  • Figure 4: Sample street view images in safe and unsafe groups with aggregated human attention heatmaps.
  • Figure 5: Top 10 objects based on Mean Object Ratio in Highlighted Regions (MoRH) in the two groups of street view images.
  • ...and 3 more figures