Table of Contents
Fetching ...

Bias Behind the Wheel: Fairness Testing of Autonomous Driving Systems

Xinyue Li, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Ying Zhang, Xuanzhe Liu

TL;DR

It is observed that pedestrian detectors can demonstrate both enhanced fairness and superior performance under specific driving conditions, which challenges the fairness-performance tradeoff theory widely acknowledged in the fairness literature.

Abstract

This paper conducts fairness testing of automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight state-of-the-art deep learning-based pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues, particularly related to age. The proportion of undetected children is 20.14% higher compared to adults. Furthermore, we explore how various driving scenarios affect the fairness of pedestrian detectors. We find that pedestrian detectors demonstrate significant gender biases during night time, potentially exacerbating the prevalent societal issue of female safety concerns during nighttime out. Moreover, we observe that pedestrian detectors can demonstrate both enhanced fairness and superior performance under specific driving conditions, which challenges the fairness-performance trade-off theory widely acknowledged in the fairness literature. We publicly release the code, data, and results to support future research on fairness in autonomous driving.

Bias Behind the Wheel: Fairness Testing of Autonomous Driving Systems

TL;DR

It is observed that pedestrian detectors can demonstrate both enhanced fairness and superior performance under specific driving conditions, which challenges the fairness-performance tradeoff theory widely acknowledged in the fairness literature.

Abstract

This paper conducts fairness testing of automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight state-of-the-art deep learning-based pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues, particularly related to age. The proportion of undetected children is 20.14% higher compared to adults. Furthermore, we explore how various driving scenarios affect the fairness of pedestrian detectors. We find that pedestrian detectors demonstrate significant gender biases during night time, potentially exacerbating the prevalent societal issue of female safety concerns during nighttime out. Moreover, we observe that pedestrian detectors can demonstrate both enhanced fairness and superior performance under specific driving conditions, which challenges the fairness-performance trade-off theory widely acknowledged in the fairness literature. We publicly release the code, data, and results to support future research on fairness in autonomous driving.
Paper Structure (30 sections, 4 equations, 4 figures, 11 tables)

This paper contains 30 sections, 4 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Overview of our experimental settings.
  • Figure 2: (RQ1) Miss rates of pedestrian detectors for females and males across datasets. Statistically significant gender biases are indicated by labeled miss rate values. In CityPersons and EuroCity-Day datasets with only day time data, only one detector in the EuroCity-Day dataset exhibits significant gender bias. However, in the EuroCity-Night dataset, seven out of eight detectors show significantly higher miss rates for females, revealing bias in female detection.
  • Figure 3: (RQ1) Miss rates of pedestrian detectors for children and adults across datasets. Statistically significant age biases are labeled with miss rate values. In 30 out of 32 scenarios (comprising four datasets and eight detectors), children have significantly higher miss rates than adults.
  • Figure 4: (RQ1) Bounding box size distributions of adults and children (left) and bounding box size distributions of undetected and detected pedestrians (right). We observe that both children and undetected pedestrians tend to have smaller bounding boxes.