Table of Contents
Fetching ...

GLARE: Google Apps Arabic Reviews Dataset

Fatima AlGhamdi, Reem Mohammed, Hend Al-Khalifa, Areeb Alowisheq

TL;DR

GLARE addresses the scarcity of large-scale Arabic app-review data by creating a public, multi- terabyte resource drawn from the Saudi Google PlayStore, capturing 76M reviews (69M Arabic) across 9,980 apps. The authors outline a data collection pipeline using google-play-scraper, perform comprehensive EDA, and apply feature engineering (including a term dictionary and length statistics) to enable NLP tasks such as sentiment analysis and aspect-based analysis. Key contributions include the dataset release with raw and engineered components, extensive metadata for apps and reviews, and actionable insights from the descriptive analyses (ratings skew, thumbs-up engagement, and developer replies). This resource supports NLP and software-engineering tasks, such as ABSA, spam detection, and app ranking, and paves the way for domain-specific Arabic language modeling and benchmarking in app-reviews analysis.

Abstract

This paper introduces GLARE an Arabic Apps Reviews dataset collected from Saudi Google PlayStore. It consists of 76M reviews, 69M of which are Arabic reviews of 9,980 Android Applications. We present the data collection methodology, along with a detailed Exploratory Data Analysis (EDA) and Feature Engineering on the gathered reviews. We also highlight possible use cases and benefits of the dataset.

GLARE: Google Apps Arabic Reviews Dataset

TL;DR

GLARE addresses the scarcity of large-scale Arabic app-review data by creating a public, multi- terabyte resource drawn from the Saudi Google PlayStore, capturing 76M reviews (69M Arabic) across 9,980 apps. The authors outline a data collection pipeline using google-play-scraper, perform comprehensive EDA, and apply feature engineering (including a term dictionary and length statistics) to enable NLP tasks such as sentiment analysis and aspect-based analysis. Key contributions include the dataset release with raw and engineered components, extensive metadata for apps and reviews, and actionable insights from the descriptive analyses (ratings skew, thumbs-up engagement, and developer replies). This resource supports NLP and software-engineering tasks, such as ABSA, spam detection, and app ranking, and paves the way for domain-specific Arabic language modeling and benchmarking in app-reviews analysis.

Abstract

This paper introduces GLARE an Arabic Apps Reviews dataset collected from Saudi Google PlayStore. It consists of 76M reviews, 69M of which are Arabic reviews of 9,980 Android Applications. We present the data collection methodology, along with a detailed Exploratory Data Analysis (EDA) and Feature Engineering on the gathered reviews. We also highlight possible use cases and benefits of the dataset.

Paper Structure

This paper contains 22 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Apps Metadata
  • Figure 2: Reviews Metadata
  • Figure 3: Statistics of Thumbs-up with respect to Ratings Distribution.
  • Figure 4: Percentage of Developers Engagement with respect to Reviews Ratings (1 to 5).
  • Figure 5: Top 80% Most Frequent Characters Length per Word with respect to the Total Number of Words in the Vocabulary Dataset.
  • ...and 3 more figures