CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Lintang Sutawika; Aditya Bharat Soni; Bharath Sriraam R R; Apurva Gandhi; Taha Yassine; Sanidhya Vijayvargiya; Yuchen Li; Xuhui Zhou; Yilin Zhang; Leander Melroy Maben; Graham Neubig

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Lintang Sutawika, Aditya Bharat Soni, Bharath Sriraam R R, Apurva Gandhi, Taha Yassine, Sanidhya Vijayvargiya, Yuchen Li, Xuhui Zhou, Yilin Zhang, Leander Melroy Maben, Graham Neubig

Abstract

A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code localization has been performed using embedding-based retrieval approaches such as vector search, recent work has focused on developing agents to localize relevant code either as a standalone precursor to or interleaved with performing actual work. Most prior methods on agentic code search equip the agent with complex, specialized tools, such as repository graphs derived from static analysis. In this paper, we demonstrate that, with an effective reinforcement learning recipe, a coding agent equipped with nothing more than a standard Unix terminal can be trained to achieve strong results. Our experiments on three benchmarks (SWE-Bench Verified, Pro, and Lite) reveal that our models consistently achieve superior or competitive performance over 2-18x larger base and post-trained LLMs and sometimes approach performance provided by closed models like Claude Sonnet, even when using specialized scaffolds. Our work particularly focuses on techniques for re-purposing existing coding agent environments for code search, reward design, and RL optimization. We release the resulting model family, CodeScout, along with all our code and data for the community to build upon.

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Abstract

Paper Structure (32 sections, 4 equations, 24 figures, 10 tables)

This paper contains 32 sections, 4 equations, 24 figures, 10 tables.

Introduction
Related Work
CodeScout: An Effective RL Recipe for Code Localization
Data and Environment Curation
OpenHands-Bash: Our Agent Scaffold
Reward Design
RL Training Algorithm
Experimental Setup
Training Setup
Evaluation Setup
Evaluation Metrics:
Baselines
Results
CodeScout substantially outperforms base LLMs of similar and larger sizes with OpenHands-Bash
CodeScout outperforms larger base and post-trained LLMs using complex scaffolds
...and 17 more sections

Figures (24)

Figure 1: An overview of code localization performance of various approaches on SWE-Bench Verified. CodeScout achieves superior or competitive results over larger SoTA open-source LLMs and closes the gap with frontier closed-source LLMs.
Figure 2: An overview of CodeScout: given a GitHub issue, the LLM agent navigates the pre-PR codebase using a terminal and predicts the relevant set of files, modules, and functions. The reward function computes F1 scores for these three granularities using ground truth locations extracted from the gold issue resolution patch.
Figure 3: Distribution of the top-8 most frequent Unix commands used by CodeScout at different training stages. While both LLMs initially use a broad range of Unix utilities, they eventually use a very limited set of commands as training proceeds. CodeScout-14B only uses ripgrep (rg) and sed, whereas CodeScout-4B mainly uses rg, cat, sed, and xargs.
Figure 4: System prompt for the OpenHands-Bash agent scaffold used by CodeScout (continued on next page).
Figure 5: System prompt for the OpenHands-Bash scaffold used by CodeScout.
...and 19 more figures

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Abstract

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Authors

Abstract

Table of Contents

Figures (24)