Table of Contents
Fetching ...

RAGViz: Diagnose and Visualize Retrieval-Augmented Generation

Tevin Wang, Jingyuan He, Chenyan Xiong

TL;DR

RAGViz is proposed, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents that operates efficiently with a median query time of about 5 seconds on a moderate GPU node.

Abstract

Retrieval-augmented generation (RAG) combines knowledge from domain-specific sources into large language models to ground answer generation. Current RAG systems lack customizable visibility on the context documents and the model's attentiveness towards such documents. We propose RAGViz, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents. With a built-in user interface, retrieval index, and Large Language Model (LLM) backbone, RAGViz provides two main functionalities: (1) token and document-level attention visualization, and (2) generation comparison upon context document addition and removal. As an open-source toolkit, RAGViz can be easily hosted with a custom embedding model and HuggingFace-supported LLM backbone. Using a hybrid ANN (Approximate Nearest Neighbor) index, memory-efficient LLM inference tool, and custom context snippet method, RAGViz operates efficiently with a median query time of about 5 seconds on a moderate GPU node. Our code is available at https://github.com/cxcscmu/RAGViz. A demo video of RAGViz can be found at https://youtu.be/cTAbuTu6ur4.

RAGViz: Diagnose and Visualize Retrieval-Augmented Generation

TL;DR

RAGViz is proposed, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents that operates efficiently with a median query time of about 5 seconds on a moderate GPU node.

Abstract

Retrieval-augmented generation (RAG) combines knowledge from domain-specific sources into large language models to ground answer generation. Current RAG systems lack customizable visibility on the context documents and the model's attentiveness towards such documents. We propose RAGViz, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents. With a built-in user interface, retrieval index, and Large Language Model (LLM) backbone, RAGViz provides two main functionalities: (1) token and document-level attention visualization, and (2) generation comparison upon context document addition and removal. As an open-source toolkit, RAGViz can be easily hosted with a custom embedding model and HuggingFace-supported LLM backbone. Using a hybrid ANN (Approximate Nearest Neighbor) index, memory-efficient LLM inference tool, and custom context snippet method, RAGViz operates efficiently with a median query time of about 5 seconds on a moderate GPU node. Our code is available at https://github.com/cxcscmu/RAGViz. A demo video of RAGViz can be found at https://youtu.be/cTAbuTu6ur4.

Paper Structure

This paper contains 16 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Attention visualization on the selected token sequence when using the document toggling feature.
  • Figure 2: A demo of RAGViz showcasing RAGViz's ability to identify and debug external hallucinations.
  • Figure 3: Visualization for query Why do pigs fly?. The highlighted generation is not grounded by any context documents, demonstrating internal hallucination.
  • Figure 4: High-level view of RAGViz's system architecture. The arrows within nodes represent the model use or filesystem reads. The arrows between nodes represent REST API calls. Queries are routed to each of the approximate nearest neighbor search REST servers and then reranked by the context building backend server.
  • Figure 5: A demonstration of sliding window snippeting with a window size of $5$ and a stride of $2$. The sliding window method chooses the snippet with the highest inner product similarity. Conversely, the naive first method always selects the first window shown in green.