Table of Contents
Fetching ...

Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

Vedant Pandya

Abstract

Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited transparency into model decision-making. We present XKD-Dial, a progressive four-stage training pipeline for explainable, knowledge-grounded dialogue generation in a bilingual (English-Hindi) setting, comprising: (1) multilingual adaptation, (2) English dialogue SFT with citation grounding, (3) bilingual dialogue SFT, and (4) GRPO alignment with citation-aware rewards. We evaluate six models spanning encoder-decoder (250M-3B) and decoder-only (1B-7B) architectures at every pipeline stage. Our key contributions are: (i) three post-hoc explainability analyses - cross-attention alignment, Integrated Gradients attribution, and occlusion-based causal grounding - applied systematically across the training trajectory to reveal how citation behaviour is learned, not only whether it is learned; (ii) citation-grounded SFT reduces hallucination to 0.0% for encoder-decoder models from Stage 2 onward; (iii) the progressive pipeline prevents catastrophic forgetting while improving Hindi capabilities; (iv) smaller models match larger models on English after SFT; and (v) GRPO provides marginal improvement over well-designed SFT for structured citation tasks. We evaluate across six automatic metrics (BLEU, ROUGE, BERTScore, FactScore, Citation-F1, and hallucination rate).

Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

Abstract

Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited transparency into model decision-making. We present XKD-Dial, a progressive four-stage training pipeline for explainable, knowledge-grounded dialogue generation in a bilingual (English-Hindi) setting, comprising: (1) multilingual adaptation, (2) English dialogue SFT with citation grounding, (3) bilingual dialogue SFT, and (4) GRPO alignment with citation-aware rewards. We evaluate six models spanning encoder-decoder (250M-3B) and decoder-only (1B-7B) architectures at every pipeline stage. Our key contributions are: (i) three post-hoc explainability analyses - cross-attention alignment, Integrated Gradients attribution, and occlusion-based causal grounding - applied systematically across the training trajectory to reveal how citation behaviour is learned, not only whether it is learned; (ii) citation-grounded SFT reduces hallucination to 0.0% for encoder-decoder models from Stage 2 onward; (iii) the progressive pipeline prevents catastrophic forgetting while improving Hindi capabilities; (iv) smaller models match larger models on English after SFT; and (v) GRPO provides marginal improvement over well-designed SFT for structured citation tasks. We evaluate across six automatic metrics (BLEU, ROUGE, BERTScore, FactScore, Citation-F1, and hallucination rate).
Paper Structure (94 sections, 5 equations, 17 figures, 15 tables)

This paper contains 94 sections, 5 equations, 17 figures, 15 tables.

Figures (17)

  • Figure 1: Training progression across all six evaluation metrics (BLEU, ROUGE-L, BERTScore, FactScore, Citation F1, Hallucination Rate) for all models. Note the XL generation collapse at Stage 2 and subsequent recovery at Stage 3.
  • Figure 2: Model comparison across training stages for four key metrics (BLEU, FactScore, Citation F1, BERTScore). The phase transition at Stage 2 is clearly visible across all metrics and models.
  • Figure 3: Per-language BLEU progression across training stages. English BLEU (left) shows convergence of all models after Stage 2 SFT, while Hindi BLEU (right) remains near-zero for encoder-decoder models due to morphological variation (Section \ref{['sec:disc_hindi_bleu']}). Gemma-2-2B achieves the highest Hindi BLEU (0.155) after Stage 3, followed by Mistral-7B (0.070).
  • Figure 4: GRPO reward dynamics during Stage 4. Left: mean reward with smoothed curves showing Mistral-7B consistently achieving the highest reward. Right: reward standard deviation, indicating training stability.
  • Figure 5: Stage 4 (GRPO) final score summary across all six models, showing five metrics side by side. Gemma-2-2B and Mistral-7B lead in most metrics after GRPO alignment.
  • ...and 12 more figures