Table of Contents
Fetching ...

Developing a Llama-Based Chatbot for CI/CD Question Answering: A Case Study at Ericsson

Daksh Chaudhary, Sri Lakshmi Vadlamani, Dimple Thomas, Shiva Nejati, Mehrdad Sabetzadeh

TL;DR

The paper tackles the challenge of answering CI/CD questions in an industrial setting by building a Llama 2-based chatbot that uses retrieval-augmented generation. It leverages an ensemble retriever combining BM25 and embeddings over a domain-specific Ericsson corpus to deliver accurate responses while mitigating hallucinations. Evaluated on 72 ground-truth questions, the system achieves 61.11% fully correct, 26.39% partially correct, and 12.50% incorrect answers, with error analysis highlighting retrieval and context handling as key improvement areas. The work provides practical, empirical insights into deploying domain-specific chatbots in industry and shows that ensemble retrieval can improve accuracy with competitive latency, while outlining directions for usability enhancements and a move toward a smart CI/CD task agent.

Abstract

This paper presents our experience developing a Llama-based chatbot for question answering about continuous integration and continuous delivery (CI/CD) at Ericsson, a multinational telecommunications company. Our chatbot is designed to handle the specificities of CI/CD documents at Ericsson, employing a retrieval-augmented generation (RAG) model to enhance accuracy and relevance. Our empirical evaluation of the chatbot on industrial CI/CD-related questions indicates that an ensemble retriever, combining BM25 and embedding retrievers, yields the best performance. When evaluated against a ground truth of 72 CI/CD questions and answers at Ericsson, our most accurate chatbot configuration provides fully correct answers for 61.11% of the questions, partially correct answers for 26.39%, and incorrect answers for 12.50%. Through an error analysis of the partially correct and incorrect answers, we discuss the underlying causes of inaccuracies and provide insights for further refinement. We also reflect on lessons learned and suggest future directions for further improving our chatbot's accuracy.

Developing a Llama-Based Chatbot for CI/CD Question Answering: A Case Study at Ericsson

TL;DR

The paper tackles the challenge of answering CI/CD questions in an industrial setting by building a Llama 2-based chatbot that uses retrieval-augmented generation. It leverages an ensemble retriever combining BM25 and embeddings over a domain-specific Ericsson corpus to deliver accurate responses while mitigating hallucinations. Evaluated on 72 ground-truth questions, the system achieves 61.11% fully correct, 26.39% partially correct, and 12.50% incorrect answers, with error analysis highlighting retrieval and context handling as key improvement areas. The work provides practical, empirical insights into deploying domain-specific chatbots in industry and shows that ensemble retrieval can improve accuracy with competitive latency, while outlining directions for usability enhancements and a move toward a smart CI/CD task agent.

Abstract

This paper presents our experience developing a Llama-based chatbot for question answering about continuous integration and continuous delivery (CI/CD) at Ericsson, a multinational telecommunications company. Our chatbot is designed to handle the specificities of CI/CD documents at Ericsson, employing a retrieval-augmented generation (RAG) model to enhance accuracy and relevance. Our empirical evaluation of the chatbot on industrial CI/CD-related questions indicates that an ensemble retriever, combining BM25 and embedding retrievers, yields the best performance. When evaluated against a ground truth of 72 CI/CD questions and answers at Ericsson, our most accurate chatbot configuration provides fully correct answers for 61.11% of the questions, partially correct answers for 26.39%, and incorrect answers for 12.50%. Through an error analysis of the partially correct and incorrect answers, we discuss the underlying causes of inaccuracies and provide insights for further refinement. We also reflect on lessons learned and suggest future directions for further improving our chatbot's accuracy.
Paper Structure (23 sections, 9 figures, 2 tables)

This paper contains 23 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Steps for Creating a (Domain-specific) CI/CD Corpus
  • Figure 2: Overview of Our Chatbot Design
  • Figure 3: Prompt Template for Query Rewriting
  • Figure 4: Examples of Query Rewriting
  • Figure 5: Example of Contextual Compression
  • ...and 4 more figures