Scattered Forest Search: Smarter Code Space Exploration with LLMs
Jonathan Light, Yue Wu, Yiyou Sun, Wenchao Yu, Yanchi liu, Xujiang Zhao, Ziniu Hu, Haifeng Chen, Wei Cheng
TL;DR
The paper addresses code generation by framing it as black-box optimization over the code space and introduces Scattered Forest Search (SFS), a suite of optimization-inspired techniques that enhance exploration and exploitation during LLM-guided search. SFS combines Scattering (diverse textual directions), Foresting (multi-seed initialization), and Scouting (shared insights) within a Monte Carlo Tree Search framework to avoid local optima and improve inference scaling. Theoretical analysis via Markov chain concepts and extensive empirical validation across HumanEval, MBPP, APPS, CodeContests, and Leetcode demonstrate substantial gains in pass@1 and faster discovery of correct solutions, along with increased solution diversity. The approach scales efficiently with budget and weaker models benefit most from inference-time optimization, offering practical implications for deployable, resource-efficient code-generation systems. Overall, SFS provides a simple, training-free enhancement to search-based code generation that improves accuracy, scalability, and diversity across diverse benchmarks.
Abstract
We frame code generation as a black-box optimization problem within the code space and demonstrate how optimization-inspired techniques can enhance inference scaling. Based on this perspective, we propose SCATTERED FOREST SEARCH (SFS), a novel approach that improves solution diversity and better exploits feedback during evolutionary search. Our theoretical analysis illustrates how these methods help avoid local optima during optimization, leading to more efficient exploration. Extensive experiments on HumanEval, MBPP, APPS, CodeContests, and Leetcode reveal significant performance gains. For instance, our method achieves a pass@1 rate of 67.1% on HumanEval+ and 87.2% on HumanEval with GPT-3.5, marking improvements of 8.6% and 4.3% over the state-of-the-art, while also halving the iterations needed to find the correct solution. Furthermore, our approach scales more efficiently than existing search techniques, including tree search, line search, and repeated sampling.
