GLARE: Google Apps Arabic Reviews Dataset
Fatima AlGhamdi, Reem Mohammed, Hend Al-Khalifa, Areeb Alowisheq
TL;DR
GLARE addresses the scarcity of large-scale Arabic app-review data by creating a public, multi- terabyte resource drawn from the Saudi Google PlayStore, capturing 76M reviews (69M Arabic) across 9,980 apps. The authors outline a data collection pipeline using google-play-scraper, perform comprehensive EDA, and apply feature engineering (including a term dictionary and length statistics) to enable NLP tasks such as sentiment analysis and aspect-based analysis. Key contributions include the dataset release with raw and engineered components, extensive metadata for apps and reviews, and actionable insights from the descriptive analyses (ratings skew, thumbs-up engagement, and developer replies). This resource supports NLP and software-engineering tasks, such as ABSA, spam detection, and app ranking, and paves the way for domain-specific Arabic language modeling and benchmarking in app-reviews analysis.
Abstract
This paper introduces GLARE an Arabic Apps Reviews dataset collected from Saudi Google PlayStore. It consists of 76M reviews, 69M of which are Arabic reviews of 9,980 Android Applications. We present the data collection methodology, along with a detailed Exploratory Data Analysis (EDA) and Feature Engineering on the gathered reviews. We also highlight possible use cases and benefits of the dataset.
