vitaLITy 2: Reviewing Academic Literature Using Large Language Models
Hongye An, Arpit Narechania, Emily Wall, Kai Xu
TL;DR
vitaLITy 2 introduces an LLM-powered visual analytics system for literature review that uses a Retrieval Augmented Generation architecture to semantically search and analyze a corpus of 66,692 papers. It combines multiple text embeddings (ADA, GloVe, SPECTER) with vector databases (Faiss, ChromaDB) and a LangChain-driven prompt-chaining framework to support natural-language queries, summarization, and literature-review drafting via a chat interface. The system extends vitaLITy 1 with ADA embeddings, an enhanced UI, and novel capabilities to summarize collections of papers and generate literature reviews, all available as open-source. Despite promising capabilities, the work acknowledges limitations such as lack of full-text access and potential LLM hallucinations, outlining feasible future enhancements including full-text chunking and external knowledge integration to improve accuracy and utility.
Abstract
Academic literature reviews have traditionally relied on techniques such as keyword searches and accumulation of relevant back-references, using databases like Google Scholar or IEEEXplore. However, both the precision and accuracy of these search techniques is limited by the presence or absence of specific keywords, making literature review akin to searching for needles in a haystack. We present vitaLITy 2, a solution that uses a Large Language Model or LLM-based approach to identify semantically relevant literature in a textual embedding space. We include a corpus of 66,692 papers from 1970-2023 which are searchable through text embeddings created by three language models. vitaLITy 2 contributes a novel Retrieval Augmented Generation (RAG) architecture and can be interacted with through an LLM with augmented prompts, including summarization of a collection of papers. vitaLITy 2 also provides a chat interface that allow users to perform complex queries without learning any new programming language. This also enables users to take advantage of the knowledge captured in the LLM from its enormous training corpus. Finally, we demonstrate the applicability of vitaLITy 2 through two usage scenarios. vitaLITy 2 is available as open-source software at https://vitality-vis.github.io.
