Table of Contents
Fetching ...

Evolutionary Context Search for Automated Skill Acquisition

Qi Sun, Stefan Nielsen, Rio Yokota, Yujin Tang

TL;DR

This work tackles the limitation that LLMs struggle to acquire new capabilities post-deployment without costly retraining. It introduces Evolutionary Context Search (ECS), an optimization framework that evolves context units derived from external text corpora to maximize task performance $f(C; M, \mathcal{T})$ and finds an optimal context $C^*$ through GA-style operations and LLM-guided refinement. ECS demonstrates substantial improvements over retrieval-based baselines on BackendBench and $\tau^2$-Bench, with strong transferability to Claude Sonnet and DeepSeek, and shows promise as automated data curation to support SFT and deployment. The approach offers a practical, weight-free alternative to fine-tuning, enabling efficient, model-agnostic knowledge injection via curated contexts, while also highlighting areas for improvement in large-scale search efficiency and prompt-structuring strategies.

Abstract

Large Language Models cannot reliably acquire new knowledge post-deployment -- even when relevant text resources exist, models fail to transform them into actionable knowledge without retraining. Retrieval-Augmented Generation attempts to bridge this gap by surfacing relevant documents at inference time, yet similarity-based retrieval often fails to identify context that actually improves task performance. We introduce Evolutionary Context Search (ECS), an evolutionary method that searches context combinations using accuracy on a small development set, requiring only inference calls without weight updates. ECS moves beyond semantic similarity to discover non-obvious context pairings that significantly boost performance. Our empirical results show that ECS improves BackendBench by 27\% and $τ$-bench airline by 7\%. The evolved contexts are model-agnostic, as those evolved with Gemini-3-Flash transfer effectively to Claude Sonnet and DeepSeek. This suggests that ECS opens a path toward automated context discovery for skill acquisition -- an efficient alternative to manual prompt engineering or costly fine-tuning.

Evolutionary Context Search for Automated Skill Acquisition

TL;DR

This work tackles the limitation that LLMs struggle to acquire new capabilities post-deployment without costly retraining. It introduces Evolutionary Context Search (ECS), an optimization framework that evolves context units derived from external text corpora to maximize task performance and finds an optimal context through GA-style operations and LLM-guided refinement. ECS demonstrates substantial improvements over retrieval-based baselines on BackendBench and -Bench, with strong transferability to Claude Sonnet and DeepSeek, and shows promise as automated data curation to support SFT and deployment. The approach offers a practical, weight-free alternative to fine-tuning, enabling efficient, model-agnostic knowledge injection via curated contexts, while also highlighting areas for improvement in large-scale search efficiency and prompt-structuring strategies.

Abstract

Large Language Models cannot reliably acquire new knowledge post-deployment -- even when relevant text resources exist, models fail to transform them into actionable knowledge without retraining. Retrieval-Augmented Generation attempts to bridge this gap by surfacing relevant documents at inference time, yet similarity-based retrieval often fails to identify context that actually improves task performance. We introduce Evolutionary Context Search (ECS), an evolutionary method that searches context combinations using accuracy on a small development set, requiring only inference calls without weight updates. ECS moves beyond semantic similarity to discover non-obvious context pairings that significantly boost performance. Our empirical results show that ECS improves BackendBench by 27\% and -bench airline by 7\%. The evolved contexts are model-agnostic, as those evolved with Gemini-3-Flash transfer effectively to Claude Sonnet and DeepSeek. This suggests that ECS opens a path toward automated context discovery for skill acquisition -- an efficient alternative to manual prompt engineering or costly fine-tuning.
Paper Structure (39 sections, 1 equation, 7 figures, 7 tables, 1 algorithm)

This paper contains 39 sections, 1 equation, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Evolutionary Context Search. Our method takes a population of text resources and evolves optimized contexts that confer the knowledge required to perform tasks in unseen domains. Each successive generation accumulates task-dependent, knowledge-rich information, effectively searching the corpora to obtain token-efficient contexts that enable novel skill acquisition in LLMs.
  • Figure 2: Performance comparison on Backend Bench. Our method (ECS) achieves a 27% relative improvement over AST + Hybrid. Detailed result are in Appendix \ref{['app:full_beb_result']}.
  • Figure 3: Fitness score during the evolutionary search process.
  • Figure 4: Transferability to Unseen Models (BackendBench). Contexts evolved by ECS transfer effectively to models not used during evolution. On DeepSeek-V3.2, where standard retrieval (AST+Dense) fails to provide meaningful gains, ECS unlocks significant capability, yielding a significant improvement.
  • Figure 5: Core components of the BackendBench evolved context. The figure illustrates the three most significant code snippets: (a) A Fused MHA kernel targeting the NVIDIA Hopper architecture (58% of context); (b) a dense GEMM kernel configuration for Ampere Tensor Cores (23%); and (c) a low-level FFI interface managing LLVM pointer extraction (7%).
  • ...and 2 more figures