Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents
Jingying Zeng, Hui Liu, Zhenwei Dai, Xianfeng Tang, Chen Luo, Samarth Varshney, Zhen Li, Qi He
TL;DR
This work tackles the challenge of grounding and attributing product facts in e-commerce CSAs by introducing a citation generation paradigm that appends verifiable sources to LLM-produced responses. Using in-context learning with multi-perspective evidence and a production-aware Multi-UX Inference (MUI) system, the approach maintains UX quality while enabling scalable citation generation. The authors define and validate a suite of grounding and attribution metrics (CGR, CCR, PSR, SCR, EUR) and demonstrate that citation-enabled responses improve grounding by $13.83\%$ and boost customer engagement by $3$–$10\%$ in large-scale online A/B tests. The solution is deployed across multiple UXs, showing that citation experience can scale without sacrificing performance, reducing hallucinations and increasing trust in CSA-generated product facts.
Abstract
With the advancement of conversational large language models (LLMs), several LLM-based Conversational Shopping Agents (CSA) have been developed to help customers smooth their online shopping. The primary objective in building an engaging and trustworthy CSA is to ensure the agent's responses about product factoids are accurate and factually grounded. However, two challenges remain. First, LLMs produce hallucinated or unsupported claims. Such inaccuracies risk spreading misinformation and diminishing customer trust. Second, without providing knowledge source attribution in CSA response, customers struggle to verify LLM-generated information. To address both challenges, we present an easily productionized solution that enables a ''citation experience'' to our customers. We build auto-evaluation metrics to holistically evaluate LLM's grounding and attribution capabilities, suggesting that citation generation paradigm substantially improves grounding performance by 13.83%. To deploy this capability at scale, we introduce Multi-UX-Inference system, which appends source citations to LLM outputs while preserving existing user experience features and supporting scalable inference. Large-scale online A/B tests show that grounded CSA responses improves customer engagement by 3% - 10%, depending on UX variations.
