AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment
Xiaochong Lan, Jie Feng, Yinxing Liu, Xinlei Shi, Yong Li
TL;DR
AutoQual introduces an autonomous LLM agent that discovers interpretable, high-information features for review quality assessment by maximizing $I(Y; \mathbf{F}_{\mathcal{S}})$ through multi-perspective hypothesis generation, autonomous tool implementation, and reflective search with a dual-level memory. The approach is validated on real-world Meituan and public Amazon datasets, showing that discovered features can match or exceed pure semantic models and complement PLMs, with large-scale industrial deployment yielding measurable improvements in user engagement and conversion. Ablation studies confirm the necessity of diverse hypothesis generation and memory components, while case studies illustrate domain-specific interpretable features and diagnostics. The work argues for a general framework to convert tacit domain knowledge into explicit, computable features applicable across diverse text-quality tasks and beyond reviews.
Abstract
Ranking online reviews by their intrinsic quality is a critical task for e-commerce platforms and information services, impacting user experience and business outcomes. However, quality is a domain-dependent and dynamic concept, making its assessment a formidable challenge. Traditional methods relying on hand-crafted features are unscalable across domains and fail to adapt to evolving content patterns, while modern deep learning approaches often produce black-box models that lack interpretability and may prioritize semantics over quality. To address these challenges, we propose AutoQual, an LLM-based agent framework that automates the discovery of interpretable features. While demonstrated on review quality assessment, AutoQual is designed as a general framework for transforming tacit knowledge embedded in data into explicit, computable features. It mimics a human research process, iteratively generating feature hypotheses through reflection, operationalizing them via autonomous tool implementation, and accumulating experience in a persistent memory. We deploy our method on a large-scale online platform with a billion-level user base. Large-scale A/B testing confirms its effectiveness, increasing average reviews viewed per user by 0.79% and the conversion rate of review readers by 0.27%.
