Exploring Topic Modelling of User Reviews as a Monitoring Mechanism for Emergent Issues Within Social VR Communities

Angelo Singh; Joseph O'Hagan

Exploring Topic Modelling of User Reviews as a Monitoring Mechanism for Emergent Issues Within Social VR Communities

Angelo Singh, Joseph O'Hagan

TL;DR

This study tackles the challenge of monitoring emergent harassment in social VR by applying natural language processing to a large corpus of Rec Room reviews from Steam (roughly 40,000 items). It implements a multi-step pipeline: sentiment analysis with supervised labeling and pretrained models, followed by term-frequency investigations and BERTopic-based clustering to extract eight high-level harassment themes. The results show a downward trend in average sentiment over time and reveal recurring issues such as toxic communities, child-related harassment, racism, and moderation/technical problems. The work demonstrates that scalable, platform-agnostic monitoring of social VR communities is feasible using publicly available review data and motivates richer data collection and reporting mechanisms to support ongoing safety research and developer interventions.

Abstract

Users of social virtual reality (VR) platforms often use user reviews to document incidents of witnessed and/or experienced user harassment. However, at present, research has yet to be explore utilising this data as a monitoring mechanism to identify emergent issues within social VR communities. Such a system would be of much benefit to developers and researchers as it would enable the automatic identification of emergent issues as they occur, provide a means of longitudinally analysing harassment, and reduce the reliance on alternative, high cost, monitoring methodologies, e.g. observation or interview studies. To contribute towards the development of such a system, we collected approximately 40,000 Rec Room user reviews from the Steam storefront. We then analysed our dataset's sentiment, word/term frequencies, and conducted a topic modelling analysis of the negative reviews detected in our dataset. We report our approach was capable of longitudinally monitoring changes in review sentiment and identifying high level themes related to types of harassment known to occur in social VR platforms.

Exploring Topic Modelling of User Reviews as a Monitoring Mechanism for Emergent Issues Within Social VR Communities

TL;DR

Abstract

Paper Structure (29 sections, 5 figures, 1 table)

This paper contains 29 sections, 5 figures, 1 table.

Introduction
Our Data Sources
Why Capture Rec Room Reviews?
Why Capture Steam User Reviews?
Data Capture & Pre-processing
Data Filtering
Data Cleaning & Pre-Processing
Sentiment Analysis
Manual Labelling a Subset of User Reviews
Identifying a Pre-Trained Model for Sentiment Analysis
Results: Review Sentiment Over Time
Term Frequencies and Importance
Bag-of-Words and N-Gram Representation
Results: Bag-of-Words and N-Grams Counts
TF-IDF
...and 14 more sections

Figures (5)

Figure 1: Comparison of mean sentiment over time (ranging +1 to -1) and normalised number of reviews from 2016 to 2024. A downward trend in sentiment is seen since Rec Room's initial 2016 release, more than halving in its mean sentiment score score by 2023.
Figure 2: The 10 most frequent of individual words, bigrams and trigrams (from left to right) in the negative reviews dataset. References to children, verbal harassment, and racism occur across all counts.
Figure 3: The 20 highest TF-IDF scores in our negative reviews. Terms referring to children, racism, toxicity, and negative sentiment were prevalent.
Figure 4: A visualisation of the hierarchical representation of topics when clustering our dataset of negative reviews only.
Figure 5: A heatmap representation of topic similarity over identified topics. Darker patches on the heat map indicate a higher similarity between topics. This and the visualisation representation were used to identify the emergent from the BERTopic analysis.

Exploring Topic Modelling of User Reviews as a Monitoring Mechanism for Emergent Issues Within Social VR Communities

TL;DR

Abstract

Exploring Topic Modelling of User Reviews as a Monitoring Mechanism for Emergent Issues Within Social VR Communities

Authors

TL;DR

Abstract

Table of Contents

Figures (5)