Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

Gautam B; Anupam Purwar

Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

Gautam B, Anupam Purwar

TL;DR

This paper addresses the challenge of deploying effective RAG systems in enterprises using open-source LLMs. It builds a complete RAG pipeline with a BM25-FAISS hybrid retriever, HuggingFace embeddings, and Perplexity API-backed LLMs to process enterprise-specific web content from a public site, evaluated with ROUGE and DeepEval metrics across diverse question types. Key findings show open-source models like Llama3-8B can match or exceed proprietary performance on the enterprise dataset, with Mistral-8x7B generally lagging and GPT-3.5 serving as a strong benchmark; diminishing returns are observed when increasing the retrieval context beyond a certain point. The work demonstrates practical, cost-effective avenues for scaling enterprise QA and content generation using open ecosystems, offering actionable guidance on model choice, retrieval configuration, and evaluation strategies.

Abstract

This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks, specific for enterprise-specific data sets scraped from their websites. With the increasing reliance on LLMs in natural language processing, it is crucial to evaluate their performance, accessibility, and integration within specific organizational contexts. This study examines various open-source LLMs, explores their integration into RAG frameworks using enterprise-specific data, and assesses the performance of different open-source embeddings in enhancing the retrieval and generation process. Our findings indicate that open-source LLMs, combined with effective embedding techniques, can significantly improve the accuracy and efficiency of RAG systems, offering a viable alternative to proprietary solutions for enterprises.

Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

TL;DR

Abstract

Paper Structure (26 sections, 4 figures, 2 tables)

This paper contains 26 sections, 4 figures, 2 tables.

Introduction
Methodology
Data Collection
Sitemap Extraction
Web Crawling
Text Splitting
Embedding Generation
Vector Database Creation
LLM Integration
Perplexity API
Benefits of Using Perplexity API
Retrieval-Augmented Generation (RAG)
Hybrid Retriever Setup
RetrievalQA
Evaluation
...and 11 more sections

Figures (4)

Figure 1: Analysis of Cosine Similarity and Unigram Precision vs TopK for Mistral8x7B LLM model (left) and LLama3-8B LLM model (right)
Figure 2: Analysis of Cosine Similarity and Unigram Recall vs TopK for Mistral8x7B LLM model (left) and Llama3-8B LLM model (right)
Figure 3: Analysis of Cosine Similarity with context vs Cosine similarity with ground truth vs TopK for Mistral8x7B LLM model (left) and LLama3-8B LLM model (right)
Figure : Histogram of inference time using GPT 3.5: Average response time 4.3 seconds

Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

TL;DR

Abstract

Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

Authors

TL;DR

Abstract

Table of Contents

Figures (4)