Table of Contents
Fetching ...

Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve

Yuanzhe Liu, Ryan Deng, Tim Kaler, Xuhao Chen, Charles E. Leiserson, Yao Ma, Jie Chen

TL;DR

LessonL introduces a lesson-based multi-agent framework for code LLMs in which agents learn from each other by generating, banking, and selecting lessons to iteratively improve code solutions. The approach enables small, diverse LLM teams to outperform a single large model and other collaboration methods on code optimization and generation, with demonstrated state-of-the-art results across multiple benchmarks. Key contributions include the solicitation-banking-selection mechanism, empirical evidence of complementary strengths among LLMs, and case studies illustrating nontrivial optimizations like divide-and-conquer and precomputation. The work highlights the practical potential of interpretable, reusable coding knowledge while acknowledging overhead and scope limitations and outlining directions for broader applicability and autonomous agent behavior.

Abstract

Recent studies show that LLMs possess different skills and specialize in different tasks. In fact, we observe that their varied performance occur in several levels of granularity. For example, in the code optimization task, code LLMs excel at different optimization categories and no one dominates others. This observation prompts the question of how one leverages multiple LLM agents to solve a coding problem without knowing their complementary strengths a priori. We argue that a team of agents can learn from each other's successes and failures so as to improve their own performance. Thus, a lesson is the knowledge produced by an agent and passed on to other agents in the collective solution process. We propose a lesson-based collaboration framework, design the lesson solicitation--banking--selection mechanism, and demonstrate that a team of small LLMs with lessons learned can outperform a much larger LLM and other multi-LLM collaboration methods.

Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve

TL;DR

LessonL introduces a lesson-based multi-agent framework for code LLMs in which agents learn from each other by generating, banking, and selecting lessons to iteratively improve code solutions. The approach enables small, diverse LLM teams to outperform a single large model and other collaboration methods on code optimization and generation, with demonstrated state-of-the-art results across multiple benchmarks. Key contributions include the solicitation-banking-selection mechanism, empirical evidence of complementary strengths among LLMs, and case studies illustrating nontrivial optimizations like divide-and-conquer and precomputation. The work highlights the practical potential of interpretable, reusable coding knowledge while acknowledging overhead and scope limitations and outlining directions for broader applicability and autonomous agent behavior.

Abstract

Recent studies show that LLMs possess different skills and specialize in different tasks. In fact, we observe that their varied performance occur in several levels of granularity. For example, in the code optimization task, code LLMs excel at different optimization categories and no one dominates others. This observation prompts the question of how one leverages multiple LLM agents to solve a coding problem without knowing their complementary strengths a priori. We argue that a team of agents can learn from each other's successes and failures so as to improve their own performance. Thus, a lesson is the knowledge produced by an agent and passed on to other agents in the collective solution process. We propose a lesson-based collaboration framework, design the lesson solicitation--banking--selection mechanism, and demonstrate that a team of small LLMs with lessons learned can outperform a much larger LLM and other multi-LLM collaboration methods.

Paper Structure

This paper contains 47 sections, 2 equations, 5 figures, 11 tables, 5 algorithms.

Figures (5)

  • Figure 1: Successively improving matrix-matrix multiplication in C with lessons.
  • Figure 2: The LessonL framework (which may repeat multiple rounds).
  • Figure 3: Performance over rounds (or called "layers"). Benchmark: ParEval (serial mode).
  • Figure 4: Performance versus costs and latency. Benchmark: ParEval (serial mode). The dashed line is the Pareto front.
  • Figure 5: Comparison of models on their coding abilities for each category of the ParEval benchmark (serial mode). The coding ability is measured by the speedup when asked to optimize a code. Geometric mean of the speedups is taken over all problems in the same category. The number in the parenthesis is the maximum speedup for the corresponding category over all models. Performance for each category is normalized by the maximum speedup.