Table of Contents
Fetching ...

ReviewGraph: A Knowledge Graph Embedding Based Framework for Review Rating Prediction with Sentiment Features

A. J. W. de Vink, Natalia Amat-Lefort, Lifeng Han

TL;DR

ReviewGraph addresses predicting hotel review ratings by translating reviews into a knowledge graph of subject-predicate-object triples with sentiment edges. It evaluates a KG-embedding pipeline using Node2Vec against traditional NLP baselines and a GPT-4o–based LLM on HotelRec, finding performance on par with LLMs while offering better interpretability and visualization. The approach enables retrieval-augmented generation and graph-based insights, highlighting the value of structured representations for fine-grained sentiment and topic tracking. The work provides open-source code and suggests future work with advanced graph neural networks and improved triple extraction to boost predictive accuracy and explainability.

Abstract

In the hospitality industry, understanding the factors that drive customer review ratings is critical for improving guest satisfaction and business performance. This work proposes ReviewGraph for Review Rating Prediction (RRP), a novel framework that transforms textual customer reviews into knowledge graphs by extracting (subject, predicate, object) triples and associating sentiment scores. Using graph embeddings (Node2Vec) and sentiment features, the framework predicts review rating scores through machine learning classifiers. We compare ReviewGraph performance with traditional NLP baselines (such as Bag of Words, TF-IDF, and Word2Vec) and large language models (LLMs), evaluating them in the HotelRec dataset. In comparison to the state of the art literature, our proposed model performs similar to their best performing model but with lower computational cost (without ensemble). While ReviewGraph achieves comparable predictive performance to LLMs and outperforms baselines on agreement-based metrics such as Cohen's Kappa, it offers additional advantages in interpretability, visual exploration, and potential integration into Retrieval-Augmented Generation (RAG) systems. This work highlights the potential of graph-based representations for enhancing review analytics and lays the groundwork for future research integrating advanced graph neural networks and fine-tuned LLM-based extraction methods. We will share ReviewGraph output and platform open-sourced on our GitHub page https://github.com/aaronlifenghan/ReviewGraph

ReviewGraph: A Knowledge Graph Embedding Based Framework for Review Rating Prediction with Sentiment Features

TL;DR

ReviewGraph addresses predicting hotel review ratings by translating reviews into a knowledge graph of subject-predicate-object triples with sentiment edges. It evaluates a KG-embedding pipeline using Node2Vec against traditional NLP baselines and a GPT-4o–based LLM on HotelRec, finding performance on par with LLMs while offering better interpretability and visualization. The approach enables retrieval-augmented generation and graph-based insights, highlighting the value of structured representations for fine-grained sentiment and topic tracking. The work provides open-source code and suggests future work with advanced graph neural networks and improved triple extraction to boost predictive accuracy and explainability.

Abstract

In the hospitality industry, understanding the factors that drive customer review ratings is critical for improving guest satisfaction and business performance. This work proposes ReviewGraph for Review Rating Prediction (RRP), a novel framework that transforms textual customer reviews into knowledge graphs by extracting (subject, predicate, object) triples and associating sentiment scores. Using graph embeddings (Node2Vec) and sentiment features, the framework predicts review rating scores through machine learning classifiers. We compare ReviewGraph performance with traditional NLP baselines (such as Bag of Words, TF-IDF, and Word2Vec) and large language models (LLMs), evaluating them in the HotelRec dataset. In comparison to the state of the art literature, our proposed model performs similar to their best performing model but with lower computational cost (without ensemble). While ReviewGraph achieves comparable predictive performance to LLMs and outperforms baselines on agreement-based metrics such as Cohen's Kappa, it offers additional advantages in interpretability, visual exploration, and potential integration into Retrieval-Augmented Generation (RAG) systems. This work highlights the potential of graph-based representations for enhancing review analytics and lays the groundwork for future research integrating advanced graph neural networks and fine-tuned LLM-based extraction methods. We will share ReviewGraph output and platform open-sourced on our GitHub page https://github.com/aaronlifenghan/ReviewGraph

Paper Structure

This paper contains 39 sections, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Base Model Design. The green boxes represent the stored input and output data. The yellow ellipses represent processing modules, which can be interchanged with different embedding or classifier models to assess whether performance improves. The red circle represents the evaluation step, where various metrics are used to measure how well the chosen combination of processing modules (in the yellow ellipses) performed. The solid arrows represent the main data flow, the dotted arrow marks an optional sentiment analysis step, and the dotted lines indicate where the input data is split between review text and rating labels.
  • Figure 2: ReviewGraph Model Design: with KG Embedding.
  • Figure 3: Prediction score distributions for Node2Vec-10 with different sampling strategies.
  • Figure 4: Example of a hotel and two negative sentiment reviews connected to it. The blue circle is the hotel, the green ones are the reviews, and the yellow ones are subjects/objects extracted from the review text.
  • Figure 5: Example of review and all the nodes connected to it as well as the relationships between those nodes connected to it. The blue one is the hotel, the green one the review, and the yellow ones are subjects/object extracted from the review text.