Table of Contents
Fetching ...

Are LLMs Correctly Integrated into Software Systems?

Yuchen Shao, Yuheng Huang, Jiawei Shen, Lei Ma, Ting Su, Chengcheng Wan

TL;DR

This paper presents the first large-scale empirical study of integration defects in LLM-enabled software with RAG, analyzing 100 open-source projects and over 3,000 defect reports to identify 18 defect patterns across four core components (LLM agent, vector database, software components, and system). It shows that 77% of apps suffer more than three integration defects, affecting functionality, efficiency, and security, and introduces Hydrangea, an open defect library containing 546 identified defects and triggering tests. The authors provide a systematic lifecycle-guided set of guidelines to identify, prevent, and repair these defects, along with a benchmark open-sourced for future research. The work offers practical insights for developers and researchers to improve the reliability and performance of AI-enabled software and serves as a foundation for future tooling and automated repair approaches.

Abstract

Large language models (LLMs) provide effective solutions in various application scenarios, with the support of retrieval-augmented generation (RAG). However, developers face challenges in integrating LLM and RAG into software systems, due to lacking interface specifications, various requirements from software context, and complicated system management. In this paper, we have conducted a comprehensive study of 100 open-source applications that incorporate LLMs with RAG support, and identified 18 defect patterns. Our study reveals that 77% of these applications contain more than three types of integration defects that degrade software functionality, efficiency, and security. Guided by our study, we propose systematic guidelines for resolving these defects in software life cycle. We also construct an open-source defect library Hydrangea.

Are LLMs Correctly Integrated into Software Systems?

TL;DR

This paper presents the first large-scale empirical study of integration defects in LLM-enabled software with RAG, analyzing 100 open-source projects and over 3,000 defect reports to identify 18 defect patterns across four core components (LLM agent, vector database, software components, and system). It shows that 77% of apps suffer more than three integration defects, affecting functionality, efficiency, and security, and introduces Hydrangea, an open defect library containing 546 identified defects and triggering tests. The authors provide a systematic lifecycle-guided set of guidelines to identify, prevent, and repair these defects, along with a benchmark open-sourced for future research. The work offers practical insights for developers and researchers to improve the reliability and performance of AI-enabled software and serves as a foundation for future tooling and automated repair approaches.

Abstract

Large language models (LLMs) provide effective solutions in various application scenarios, with the support of retrieval-augmented generation (RAG). However, developers face challenges in integrating LLM and RAG into software systems, due to lacking interface specifications, various requirements from software context, and complicated system management. In this paper, we have conducted a comprehensive study of 100 open-source applications that incorporate LLMs with RAG support, and identified 18 defect patterns. Our study reveals that 77% of these applications contain more than three types of integration defects that degrade software functionality, efficiency, and security. Guided by our study, we propose systematic guidelines for resolving these defects in software life cycle. We also construct an open-source defect library Hydrangea.
Paper Structure (41 sections, 13 figures, 2 tables)

This paper contains 41 sections, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Components and workflow of LLM-enabled software.
  • Figure 2: A use case of RealCharshaunwei2024realchar, a character simulator.
  • Figure 3: A fix of missing LLM input format validation in PDF-ChatBotmayooear_gpt4_pdf_chatbot_langchain.
  • Figure 4: Expected output format of LLM agent in h2oGPTh2oai_h2ogpt
  • Figure 5: Unnecessary LLM output in Code-Review-GPTmattzcarey_code_review_gpt
  • ...and 8 more figures