Table of Contents
Fetching ...

JARVIS: An Evidence-Grounded Retrieval System for Interpretable Deceptive Reviews Adjudication

Nan Lu, Leyang Li, Yurong Hu, Rui Lin, Shaoyi Xu

TL;DR

This work tackles deceptive reviews in e-commerce, highlighting two key limitations of prior methods: poor generalization and limited interpretability. It introduces JARVIS, a training-free framework that combines hybrid dense-sparse multimodal retrieval, a heterogeneous evidence graph, and retrieval-augmented LLM reasoning to produce an interpretable adjudication with an evidence chain. Offline evaluations on a 300k-review dataset show superior precision and recall compared with baselines, and production deployment on JD.com demonstrates substantial gains in recall, a large reduction in manual inspection time, and high adoption of model-generated reasoning. The approach offers a scalable blueprint for interpretable, evidence-grounded fraud detection in real-world marketplaces with direct governance benefits.

Abstract

Deceptive reviews, refer to fabricated feedback designed to artificially manipulate the perceived quality of products. Within modern e-commerce ecosystems, these reviews remain a critical governance challenge. Despite advances in review-level and graph-based detection methods, two pivotal limitations remain: inadequate generalization and lack of interpretability. To address these challenges, we propose JARVIS, a framework providing Judgment via Augmented Retrieval and eVIdence graph Structures. Starting from the review to be evaluated, it retrieves semantically similar evidence via hybrid dense-sparse multimodal retrieval, expands relational signals through shared entities, and constructs a heterogeneous evidence graph. Large language model then performs evidence-grounded adjudication to produce interpretable risk assessments. Offline experiments demonstrate that JARVIS enhances performance on our constructed review dataset, achieving a precision increase from 0.953 to 0.988 and a recall boost from 0.830 to 0.901. In the production environment, our framework achieves a 27% increase in the recall volume and reduces manual inspection time by 75%. Furthermore, the adoption rate of the model-generated analysis reaches 96.4%.

JARVIS: An Evidence-Grounded Retrieval System for Interpretable Deceptive Reviews Adjudication

TL;DR

This work tackles deceptive reviews in e-commerce, highlighting two key limitations of prior methods: poor generalization and limited interpretability. It introduces JARVIS, a training-free framework that combines hybrid dense-sparse multimodal retrieval, a heterogeneous evidence graph, and retrieval-augmented LLM reasoning to produce an interpretable adjudication with an evidence chain. Offline evaluations on a 300k-review dataset show superior precision and recall compared with baselines, and production deployment on JD.com demonstrates substantial gains in recall, a large reduction in manual inspection time, and high adoption of model-generated reasoning. The approach offers a scalable blueprint for interpretable, evidence-grounded fraud detection in real-world marketplaces with direct governance benefits.

Abstract

Deceptive reviews, refer to fabricated feedback designed to artificially manipulate the perceived quality of products. Within modern e-commerce ecosystems, these reviews remain a critical governance challenge. Despite advances in review-level and graph-based detection methods, two pivotal limitations remain: inadequate generalization and lack of interpretability. To address these challenges, we propose JARVIS, a framework providing Judgment via Augmented Retrieval and eVIdence graph Structures. Starting from the review to be evaluated, it retrieves semantically similar evidence via hybrid dense-sparse multimodal retrieval, expands relational signals through shared entities, and constructs a heterogeneous evidence graph. Large language model then performs evidence-grounded adjudication to produce interpretable risk assessments. Offline experiments demonstrate that JARVIS enhances performance on our constructed review dataset, achieving a precision increase from 0.953 to 0.988 and a recall boost from 0.830 to 0.901. In the production environment, our framework achieves a 27% increase in the recall volume and reduces manual inspection time by 75%. Furthermore, the adoption rate of the model-generated analysis reaches 96.4%.
Paper Structure (15 sections, 6 equations, 2 figures, 2 tables)

This paper contains 15 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The overall architecture of the proposed deceptive review detection framework. The process begins with Stage 1, encoding the target review via dual multi-modal encoders and a hybrid scoring engine for retrieval. In Stage 2, the retrieved Top-k reviews seed a heterogeneous graph expansion. Finally, Stage 3 feeds review content, relational entities, and behavioral paths into the LLM based reasoner for graph-grounded evidence chain generation and fraud classification.
  • Figure 2: Ablation Study on JARVIS components. "Dense" and "Sparse" denote Dense Embedding and Sparse Embedding relatively. "Review" and "Entity" denote Review Node and Entity Node in the evidence subgraph, respectively.