Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability
Gautam B, Anupam Purwar
TL;DR
This paper addresses the challenge of deploying effective RAG systems in enterprises using open-source LLMs. It builds a complete RAG pipeline with a BM25-FAISS hybrid retriever, HuggingFace embeddings, and Perplexity API-backed LLMs to process enterprise-specific web content from a public site, evaluated with ROUGE and DeepEval metrics across diverse question types. Key findings show open-source models like Llama3-8B can match or exceed proprietary performance on the enterprise dataset, with Mistral-8x7B generally lagging and GPT-3.5 serving as a strong benchmark; diminishing returns are observed when increasing the retrieval context beyond a certain point. The work demonstrates practical, cost-effective avenues for scaling enterprise QA and content generation using open ecosystems, offering actionable guidance on model choice, retrieval configuration, and evaluation strategies.
Abstract
This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks, specific for enterprise-specific data sets scraped from their websites. With the increasing reliance on LLMs in natural language processing, it is crucial to evaluate their performance, accessibility, and integration within specific organizational contexts. This study examines various open-source LLMs, explores their integration into RAG frameworks using enterprise-specific data, and assesses the performance of different open-source embeddings in enhancing the retrieval and generation process. Our findings indicate that open-source LLMs, combined with effective embedding techniques, can significantly improve the accuracy and efficiency of RAG systems, offering a viable alternative to proprietary solutions for enterprises.
