Table of Contents
Fetching ...

TrustRAG: An Information Assistant with Retrieval Augmented Generation

Yixing Fan, Qiang Yan, Wenshan Wang, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng

TL;DR

TrustRAG tackles the trustworthiness gap in retrieval-augmented generation by integrating semantic-enhanced indexing, utility-based retrieval, and attribution-enhanced generation. The system combines a modular library with a no-code studio and Demonstrates Excerpt-Based Question Answering (ExQA) to enable private-corpus deployments. Key innovations include co-reference aware semantic chunking with time normalization, usefulness discriminators plus fine-grained evidence extraction, and post-generation citation grouping with cross-referencing. The open-source release aims to deliver reliable, traceable outputs for practical RAG deployments across domain-specific tasks.

Abstract

\Ac{RAG} has emerged as a crucial technique for enhancing large models with real-time and domain-specific knowledge. While numerous improvements and open-source tools have been proposed to refine the \ac{RAG} framework for accuracy, relatively little attention has been given to improving the trustworthiness of generated results. To address this gap, we introduce TrustRAG, a novel framework that enhances \ac{RAG} from three perspectives: indexing, retrieval, and generation. Specifically, in the indexing stage, we propose a semantic-enhanced chunking strategy that incorporates hierarchical indexing to supplement each chunk with contextual information, ensuring semantic completeness. In the retrieval stage, we introduce a utility-based filtering mechanism to identify high-quality information, supporting answer generation while reducing input length. In the generation stage, we propose fine-grained citation enhancement, which detects opinion-bearing sentences in responses and infers citation relationships at the sentence-level, thereby improving citation accuracy. We open-source the TrustRAG framework and provide a demonstration studio designed for excerpt-based question answering tasks \footnote{https://huggingface.co/spaces/golaxy/TrustRAG}. Based on these, we aim to help researchers: 1) systematically enhancing the trustworthiness of \ac{RAG} systems and (2) developing their own \ac{RAG} systems with more reliable outputs.

TrustRAG: An Information Assistant with Retrieval Augmented Generation

TL;DR

TrustRAG tackles the trustworthiness gap in retrieval-augmented generation by integrating semantic-enhanced indexing, utility-based retrieval, and attribution-enhanced generation. The system combines a modular library with a no-code studio and Demonstrates Excerpt-Based Question Answering (ExQA) to enable private-corpus deployments. Key innovations include co-reference aware semantic chunking with time normalization, usefulness discriminators plus fine-grained evidence extraction, and post-generation citation grouping with cross-referencing. The open-source release aims to deliver reliable, traceable outputs for practical RAG deployments across domain-specific tasks.

Abstract

\Ac{RAG} has emerged as a crucial technique for enhancing large models with real-time and domain-specific knowledge. While numerous improvements and open-source tools have been proposed to refine the \ac{RAG} framework for accuracy, relatively little attention has been given to improving the trustworthiness of generated results. To address this gap, we introduce TrustRAG, a novel framework that enhances \ac{RAG} from three perspectives: indexing, retrieval, and generation. Specifically, in the indexing stage, we propose a semantic-enhanced chunking strategy that incorporates hierarchical indexing to supplement each chunk with contextual information, ensuring semantic completeness. In the retrieval stage, we introduce a utility-based filtering mechanism to identify high-quality information, supporting answer generation while reducing input length. In the generation stage, we propose fine-grained citation enhancement, which detects opinion-bearing sentences in responses and infers citation relationships at the sentence-level, thereby improving citation accuracy. We open-source the TrustRAG framework and provide a demonstration studio designed for excerpt-based question answering tasks \footnote{https://huggingface.co/spaces/golaxy/TrustRAG}. Based on these, we aim to help researchers: 1) systematically enhancing the trustworthiness of \ac{RAG} systems and (2) developing their own \ac{RAG} systems with more reliable outputs.

Paper Structure

This paper contains 9 sections, 3 figures.

Figures (3)

  • Figure 1: An Overview of the System Architecture.
  • Figure 2: An overview of the TrustRAG framework.
  • Figure 3: Example usage of TrustRAG on Excerpt-based Questions