Table of Contents
Fetching ...

ReadMe.LLM: A Framework to Help LLMs Understand Your Library

Sandya Wijaya, Jacob Bolano, Alejandro Gomez Soteres, Shriyanshu Kode, Yue Huang, Anant Sahai

TL;DR

The paper addresses the challenge that LLMs struggle to correctly utilize niche software libraries due to underrepresented, human-oriented documentation. It introduces ReadMe.LLM, an LLM-oriented, XML-structured documentation format that libraries attach to their codebases to guide code generation. Across five LLMs and two libraries, ReadMe.LLM contexts yielded substantial improvements in code-generation accuracy, achieving near-perfect performance and up to $100\%$ in several cases, with strong generalization in held-out tests. The findings suggest ReadMe.LLM can democratize access to smaller libraries by standardizing machine-friendly context and enable smoother integration with AI agents and tools, while outlining future work on API/tool-use extensions and editor integrations.

Abstract

Large Language Models (LLMs) often struggle with code generation tasks involving niche software libraries. Existing code generation techniques with only human-oriented documentation can fail -- even when the LLM has access to web search and the library is documented online. To address this challenge, we propose ReadMe$.$LLM, LLM-oriented documentation for software libraries. By attaching the contents of ReadMe$.$LLM to a query, performance consistently improves to near-perfect accuracy, with one case study demonstrating up to 100% success across all tested models. We propose a software development lifecycle where LLM-specific documentation is maintained alongside traditional software updates. In this study, we present two practical applications of the ReadMe$.$LLM idea with diverse software libraries, highlighting that our proposed approach could generalize across programming domains.

ReadMe.LLM: A Framework to Help LLMs Understand Your Library

TL;DR

The paper addresses the challenge that LLMs struggle to correctly utilize niche software libraries due to underrepresented, human-oriented documentation. It introduces ReadMe.LLM, an LLM-oriented, XML-structured documentation format that libraries attach to their codebases to guide code generation. Across five LLMs and two libraries, ReadMe.LLM contexts yielded substantial improvements in code-generation accuracy, achieving near-perfect performance and up to in several cases, with strong generalization in held-out tests. The findings suggest ReadMe.LLM can democratize access to smaller libraries by standardizing machine-friendly context and enable smoother integration with AI agents and tools, while outlining future work on API/tool-use extensions and editor integrations.

Abstract

Large Language Models (LLMs) often struggle with code generation tasks involving niche software libraries. Existing code generation techniques with only human-oriented documentation can fail -- even when the LLM has access to web search and the library is documented online. To address this challenge, we propose ReadMeLLM, LLM-oriented documentation for software libraries. By attaching the contents of ReadMeLLM to a query, performance consistently improves to near-perfect accuracy, with one case study demonstrating up to 100% success across all tested models. We propose a software development lifecycle where LLM-specific documentation is maintained alongside traditional software updates. In this study, we present two practical applications of the ReadMeLLM idea with diverse software libraries, highlighting that our proposed approach could generalize across programming domains.

Paper Structure

This paper contains 23 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: Survey of existing prompting strategies for code generation
  • Figure 2: Depicting how ReadMe.LLM works
  • Figure 3: Example ReadMe.LLM Structure
  • Figure 4: ReadMe.LLM integrated into library contributor workflow
  • Figure 5: ReadMe.LLM integrated into engineer workflow
  • ...and 13 more figures