Table of Contents
Fetching ...

Auditing Yelp's Business Ranking and Review Recommendation Through the Lens of Fairness

Mohit Singhal, Javier Pacheco, Seyyed Mohammad Sadegh Moosavi Khorzooghi, Tanushree Debi, Abolfazl Asudeh, Gautam Das, Shirin Nilizadeh

TL;DR

This study tackles fairness in Yelp's black-box decision systems by empirically auditing both review filtering and business ranking using public data and a dual-data framework. It applies regression-based methods and an exposure metric to quantify bias, including a hotspot detection approach via DBSCAN and a controlled regression setup that accounts for quality signals and demographics. The findings show bias against less-established users in the review filtering process and a robust association between hotspot presence and higher exposure in rankings, with city-dependent demographic patterns that suggest potential discriminatory feedback loops. The work highlights the need for transparency, responsible data practices, and open-sourcing of platform models to enable broader auditing and policy guidance.

Abstract

Auditing is critical to ensuring the fairness and reliability of decision-making systems. However, auditing a black-box system for bias can be challenging due to the lack of transparency in the model's internal workings. In many web applications, such as Yelp, it is challenging, if not impossible, to manipulate their inputs systematically to identify bias in the output. Yelp connects users and businesses, where users identify new businesses and simultaneously express their experiences through reviews. Yelp recommendation software moderates user-provided content by categorizing it into recommended and not-recommended sections. The recommended reviews, among other attributes, are used by Yelp's ranking algorithm to rank businesses in a neighborhood. Due to Yelp's substantial popularity and its high impact on local businesses' success, understanding the bias of its algorithms is crucial. This data-driven study, for the first time, investigates the bias of Yelp's business ranking and review recommendation system. We examine three hypotheses to assess if Yelp's recommendation software shows bias against reviews of less established users with fewer friends and reviews and if Yelp's business ranking algorithm shows bias against restaurants located in specific neighborhoods, particularly in hotspot regions, with specific demographic compositions. Our findings show that reviews of less-established users are disproportionately categorized as not-recommended. We also find a positive association between restaurants' location in hotspot regions and their average exposure. Furthermore, we observed some cases of severe disparity bias in cities where the hotspots are in neighborhoods with less demographic diversity or higher affluence and education levels.

Auditing Yelp's Business Ranking and Review Recommendation Through the Lens of Fairness

TL;DR

This study tackles fairness in Yelp's black-box decision systems by empirically auditing both review filtering and business ranking using public data and a dual-data framework. It applies regression-based methods and an exposure metric to quantify bias, including a hotspot detection approach via DBSCAN and a controlled regression setup that accounts for quality signals and demographics. The findings show bias against less-established users in the review filtering process and a robust association between hotspot presence and higher exposure in rankings, with city-dependent demographic patterns that suggest potential discriminatory feedback loops. The work highlights the need for transparency, responsible data practices, and open-sourcing of platform models to enable broader auditing and policy guidance.

Abstract

Auditing is critical to ensuring the fairness and reliability of decision-making systems. However, auditing a black-box system for bias can be challenging due to the lack of transparency in the model's internal workings. In many web applications, such as Yelp, it is challenging, if not impossible, to manipulate their inputs systematically to identify bias in the output. Yelp connects users and businesses, where users identify new businesses and simultaneously express their experiences through reviews. Yelp recommendation software moderates user-provided content by categorizing it into recommended and not-recommended sections. The recommended reviews, among other attributes, are used by Yelp's ranking algorithm to rank businesses in a neighborhood. Due to Yelp's substantial popularity and its high impact on local businesses' success, understanding the bias of its algorithms is crucial. This data-driven study, for the first time, investigates the bias of Yelp's business ranking and review recommendation system. We examine three hypotheses to assess if Yelp's recommendation software shows bias against reviews of less established users with fewer friends and reviews and if Yelp's business ranking algorithm shows bias against restaurants located in specific neighborhoods, particularly in hotspot regions, with specific demographic compositions. Our findings show that reviews of less-established users are disproportionately categorized as not-recommended. We also find a positive association between restaurants' location in hotspot regions and their average exposure. Furthermore, we observed some cases of severe disparity bias in cities where the hotspots are in neighborhoods with less demographic diversity or higher affluence and education levels.
Paper Structure (15 sections, 13 figures, 21 tables)

This paper contains 15 sections, 13 figures, 21 tables.

Figures (13)

  • Figure 1: Histogram of likelihood of reviews to be fake
  • Figure 2: CDF of Reviews and Friends for Recommended and Not Recommended Users
  • Figure 3: Histogram of clusters. The x-axis shows the cluster size and Y-axis shows the frequency.
  • Figure 4: Heatmap for Chicago. The black circles represent individual restaurants' exposures and colored squares represent the % of sensitive attributes. Darker the color, higher the % of the sensitive attribute and vice-versa.
  • Figure 5: Heatmap for San Jose
  • ...and 8 more figures