Unmask It! AI-Generated Product Review Detection in Dravidian Languages
Somsubhra De, Advait Vats
TL;DR
This work addresses the detection of AI-generated product reviews in low-resource Dravidian languages by evaluating a broad spectrum of methods from traditional ML to state-of-the-art transformers (e.g., IndicSBERT, MuRIL, XLM-RoBERTa, Malayalam-BERT). Through Tamil and Malayalam datasets, the study demonstrates that transformer-based approaches substantially outperform traditional methods and DL architectures, with IndicSBERT excelling for Tamil and Malayalam-BERT for Malayalam, while some exceptional transformer runs show near-perfect precision/recall. Qualitative analyses reveal language-specific patterns in AI vs human reviews and highlight the value of human-in-the-loop insights. The findings emphasize the practical potential of transformer-based detectors to improve trust in e-commerce platforms for under-resourced languages, while outlining future work on LLMs, ensemble strategies, larger diverse corpora, and ethical considerations. Overall, the paper advances AI-generated content detection in Dravidian languages and provides actionable guidance for deploying robust detection in real-world, multilingual marketplaces.
Abstract
The rise of Generative AI has led to a surge in AI-generated reviews, often posing a serious threat to the credibility of online platforms. Reviews serve as the primary source of information about products and services. Authentic reviews play a vital role in consumer decision-making. The presence of fabricated content misleads consumers, undermines trust and facilitates potential fraud in digital marketplaces. This study focuses on detecting AI-generated product reviews in Tamil and Malayalam, two low-resource languages where research in this domain is relatively under-explored. We worked on a range of approaches - from traditional machine learning methods to advanced transformer-based models such as Indic-BERT, IndicSBERT, MuRIL, XLM-RoBERTa and MalayalamBERT. Our findings highlight the effectiveness of leveraging the state-of-the-art transformers in accurately identifying AI-generated content, demonstrating the potential in enhancing the detection of fake reviews in low-resource language settings.
