Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

Ali Rezaei Nasab; Maedeh Dashti; Mojtaba Shahin; Mansooreh Zahedi; Hourieh Khalajzadeh; Chetan Arora; Peng Liang

Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

Ali Rezaei Nasab, Maedeh Dashti, Mojtaba Shahin, Mansooreh Zahedi, Hourieh Khalajzadeh, Chetan Arora, Peng Liang

TL;DR

This study conducts a large-scale, data-driven investigation of fairness concerns in AI-based mobile apps by constructing a ground-truth dataset of 1,132 fairness and 1,473 non-fairness reviews, and training classifiers that achieve a precision of 0.94. The best-performing model is applied to ~9.5 million reviews from 108 AI-based apps, identifying ~92 thousand fairness reviews, which are then clustered via K-means to reveal six distinct fairness concerns. Qualitative analysis of 2,248 app-owner responses uncovers six root causes behind fairness concerns, including copyright issues, development complexity, and external factors. The work provides a scalable methodology for detecting, summarizing, and diagnosing fairness issues in app reviews, with practical implications for developers and policymakers seeking to mitigate unfair AI-driven behaviors in mobile apps.

Abstract

Fairness is one of the socio-technical concerns that must be addressed in software systems. Considering the popularity of mobile software applications (apps) among a wide range of individuals worldwide, mobile apps with unfair behaviors and outcomes can affect a significant proportion of the global population, potentially more than any other type of software system. Users express a wide range of socio-technical concerns in mobile app reviews. This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based mobile apps may be higher than in non-AI-based apps. To this end, we first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing model can detect fairness reviews with a precision of 94%. We then applied the best-performing model on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns.

Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

TL;DR

Abstract

Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

Authors

TL;DR

Abstract

Table of Contents

Figures (5)