CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Tanmay Gupta; Luca Weihs; Aniruddha Kembhavi

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Tanmay Gupta, Luca Weihs, Aniruddha Kembhavi

TL;DR

CodeNav proposes moving beyond traditional tool-use for LLMs by enabling direct interaction with real-world codebases through a single-agent, multi-environment framework. It retrieves, imports, and executes relevant code blocks from target repositories, iterating with execution feedback to construct solutions. Across quantitative benchmarks and qualitative case studies, CodeNav achieves competitive performance to tool-use baselines without explicit tool registration and demonstrates versatility across multimodal tasks and research-assistant workflows. The work highlights the potential of code-based tool discovery and execution to unlock solving complex queries in real-world software ecosystems, while outlining practical considerations and societal impacts.

Abstract

We present CodeNav, an LLM agent that navigates and leverages previously unseen code repositories to solve user queries. In contrast to tool-use LLM agents that require ``registration'' of all relevant tools via manual descriptions within the LLM context, CodeNav automatically indexes and searches over code blocks in the target codebase, finds relevant code snippets, imports them, and uses them to iteratively generate a solution with execution feedback. To highlight the core-capabilities of CodeNav, we first showcase three case studies where we use CodeNav for solving complex user queries using three diverse codebases. Next, on three benchmarks, we quantitatively compare the effectiveness of code-use (which only has access to the target codebase) to tool-use (which has privileged access to all tool names and descriptions). Finally, we study the effect of varying kinds of tool and library descriptions on code-use performance, as well as investigate the advantage of the agent seeing source code as opposed to natural descriptions of code. All code will be made open source under a permissive license.

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

TL;DR

Abstract

Paper Structure (34 sections, 12 figures, 6 tables)

This paper contains 34 sections, 12 figures, 6 tables.

Introduction
Related Work
CodeNav
Overview
Environments
Agent Actions
Environment Responses
Case Studies
CodeNav on CodeNav
Multimodal Processing and Reasoning
Research assistant
Experiments
How does code-use compare to tool-use on tool-use benchmarks?
Is a library description sufficient for tool-use?
Does seeing the source code help code-use?
...and 19 more sections

Figures (12)

Figure 1: CodeNav's single-agent, multi-environment interaction protocol. Given a user query, a brief description of the codebase (library description), and the interaction history, the LLM agent produces an pythonaction comprising of a thought, action type, and action content. The action gets executed in the target environment (identified by action type) to produce a pythonresponse. The interaction at the current step consisting of the pythonaction-pythonresponse pair is appended to the interaction history as context for the LLM to produce the next action.
Figure 2: Running the CodeNav agent with the CodeNav codebase.
Figure 3: CodeNav unifies "agentic" applications via code-use. Two case studies: (top) a visual reasoning and image editing agent; and (bottom) an information gathering agent. These applications are enabled simply by changing the target codebase search index and the high-level library description.
Figure 4: Full retrieval response example. An example of a full retrieval. This corresponds to an expanded version of R5 in Fig. \ref{['fig:codenav-on-codenav']}.
Figure 5: Library description for CodeNav case study in Sec. \ref{['sec:codenav_case_study']}
...and 7 more figures

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

TL;DR

Abstract

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Authors

TL;DR

Abstract

Table of Contents

Figures (12)