PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant
Congrui Yin, Evan Wei, Zhongxing Zhang, Zaifu Zhan
TL;DR
PaperHelper addresses the problem of information overload and LLM hallucinations during literature review by deploying a knowledge-based QA assistant built on a Retrieval-Augmented Generation framework. It integrates RAFT and RAG Fusion within an end-to-end, Streamlit-based pipeline that batch-imports documents and uses a Reference Knowledge Graph with Mermaid to visualize relationships. Key contributions include the end-to-end pipeline, RAG Fusion and RAFT implementations, parallel generation for references, and a domain-specific fine-tuning set (~5,000 ML papers) with evaluation showing a peak $F1$ of $60.04$ and latency $5.8$ seconds on GPT-4-32k, outperforming basic RAG by about $7 ext{%}$. The work demonstrates substantial improvements in retrieval accuracy and reliability for literature review tasks, with implications for scalable, transparent, and interactive paper reading, though it notes limitations such as figure recognition and ongoing hallucination challenges. Future directions point to multimodal capabilities to ingest figures and richer interaction for expert users.
Abstract
In the paper, we introduce a paper reading assistant, PaperHelper, a potent tool designed to enhance the capabilities of researchers in efficiently browsing and understanding scientific literature. Utilizing the Retrieval-Augmented Generation (RAG) framework, PaperHelper effectively minimizes hallucinations commonly encountered in large language models (LLMs), optimizing the extraction of accurate, high-quality knowledge. The implementation of advanced technologies such as RAFT and RAG Fusion significantly boosts the performance, accuracy, and reliability of the LLMs-based literature review process. Additionally, PaperHelper features a user-friendly interface that facilitates the batch downloading of documents and uses the Mermaid format to illustrate structural relationships between documents. Experimental results demonstrate that PaperHelper, based on a fine-tuned GPT-4 API, achieves an F1 Score of 60.04, with a latency of only 5.8 seconds, outperforming the basic RAG model by 7\% in F1 Score.
