Evolutionary Context Search for Automated Skill Acquisition
Qi Sun, Stefan Nielsen, Rio Yokota, Yujin Tang
TL;DR
This work tackles the limitation that LLMs struggle to acquire new capabilities post-deployment without costly retraining. It introduces Evolutionary Context Search (ECS), an optimization framework that evolves context units derived from external text corpora to maximize task performance $f(C; M, \mathcal{T})$ and finds an optimal context $C^*$ through GA-style operations and LLM-guided refinement. ECS demonstrates substantial improvements over retrieval-based baselines on BackendBench and $\tau^2$-Bench, with strong transferability to Claude Sonnet and DeepSeek, and shows promise as automated data curation to support SFT and deployment. The approach offers a practical, weight-free alternative to fine-tuning, enabling efficient, model-agnostic knowledge injection via curated contexts, while also highlighting areas for improvement in large-scale search efficiency and prompt-structuring strategies.
Abstract
Large Language Models cannot reliably acquire new knowledge post-deployment -- even when relevant text resources exist, models fail to transform them into actionable knowledge without retraining. Retrieval-Augmented Generation attempts to bridge this gap by surfacing relevant documents at inference time, yet similarity-based retrieval often fails to identify context that actually improves task performance. We introduce Evolutionary Context Search (ECS), an evolutionary method that searches context combinations using accuracy on a small development set, requiring only inference calls without weight updates. ECS moves beyond semantic similarity to discover non-obvious context pairings that significantly boost performance. Our empirical results show that ECS improves BackendBench by 27\% and $τ$-bench airline by 7\%. The evolved contexts are model-agnostic, as those evolved with Gemini-3-Flash transfer effectively to Claude Sonnet and DeepSeek. This suggests that ECS opens a path toward automated context discovery for skill acquisition -- an efficient alternative to manual prompt engineering or costly fine-tuning.
