Table of Contents
Fetching ...

Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA

Wenwei Li, Ming Xu, Tianle Xia, Lingxiang Hu, Yiding Sun, Linfang Shang, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

TL;DR

A reinforced co-adaptation framework that jointly optimizes retrieval and generation through two components: Graph-aware Retrieval (GraphRAG), which models entity-relation structure over a high-citation knowledge subgraph for multi-hop, domain-specific evidence selection, and evidence-constrained reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional rewards covering faithfulness, style compliance, safety, and URL validity.

Abstract

Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. Although Retrieval-Augmented Generation (RAG) is widely adopted, deploying it in production remains challenging because industrial knowledge is inherently relational, frequently updated, and insufficiently aligned with generation objectives. We propose a reinforced co-adaptation framework that jointly optimizes retrieval and generation through two components: (1) Graph-aware Retrieval (GraphRAG), which models entity-relation structure over a high-citation knowledge subgraph for multi-hop, domain-specific evidence selection; and (2) evidence-constrained reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional rewards covering faithfulness, style compliance, safety, and URL validity. Experiments on an internal advertising QA dataset show consistent gains across expert-judged dimensions including accuracy, completeness, and safety, while reducing the hallucination rate by 72\%. A two-week online A/B test demonstrates a 28.6\% increase in like rate, a 46.2\% decrease in dislike rate, and a 92.7\% reduction in URL hallucination. The system has been running in production for over half a year and has served millions of QA interactions.

Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA

TL;DR

A reinforced co-adaptation framework that jointly optimizes retrieval and generation through two components: Graph-aware Retrieval (GraphRAG), which models entity-relation structure over a high-citation knowledge subgraph for multi-hop, domain-specific evidence selection, and evidence-constrained reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional rewards covering faithfulness, style compliance, safety, and URL validity.

Abstract

Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. Although Retrieval-Augmented Generation (RAG) is widely adopted, deploying it in production remains challenging because industrial knowledge is inherently relational, frequently updated, and insufficiently aligned with generation objectives. We propose a reinforced co-adaptation framework that jointly optimizes retrieval and generation through two components: (1) Graph-aware Retrieval (GraphRAG), which models entity-relation structure over a high-citation knowledge subgraph for multi-hop, domain-specific evidence selection; and (2) evidence-constrained reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional rewards covering faithfulness, style compliance, safety, and URL validity. Experiments on an internal advertising QA dataset show consistent gains across expert-judged dimensions including accuracy, completeness, and safety, while reducing the hallucination rate by 72\%. A two-week online A/B test demonstrates a 28.6\% increase in like rate, a 46.2\% decrease in dislike rate, and a 92.7\% reduction in URL hallucination. The system has been running in production for over half a year and has served millions of QA interactions.
Paper Structure (40 sections, 2 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 40 sections, 2 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Traditional QA vs. our approach over a shared knowledge base. Given the same user query and knowledge items A, B, C, D, traditional methods often yield incomplete, hallucinated, over-generated, or verbose answers. Our method produces an exact answer that remains complete, faithful, and concise.
  • Figure 2: System overview. Given a user query $q$ and a private knowledge base $K$, the retrieval system constructs an evidence set $D$ via two parallel channels: a GraphRAG channel over a high-citation knowledge base $K_h$ and a traditional RAG channel with query rewriting and BGE + BM25 hybrid retrieval. Results are merged and deduplicated. The RL-tuned generator then produces a response optimized by GRPO with multi-dimensional rewards for faithfulness, style compliance, safety, and URL validity.
  • Figure 3: Knowledge recall enhancement across Base RAG, GraphRAG, and Parallel retrieval. Effective chunks pre query and recall effectiveness in percent.
  • Figure 4: Training dynamics of multi-dimensional reward components during RL.
  • Figure 5: FaithEval generalization: accuracy (%) on Inconsistent, Unanswerable, Counterfactual, and Overall.
  • ...and 4 more figures