Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces
Jiapeng Yu, Yuqian Wu, Yajing Zhan, Wenhao Guo, Zhou Xu, Raymond Lee
TL;DR
This work tackles the challenge of beginner-friendly code correction by introducing Co-Learning, a multi-agent framework that leverages environmentally reinforcement learning (E-RL) to dynamically select among LLM-based agents for code repair. Built on the PADE platform with five specialized agents, Co-Learning uses two reward mechanisms to optimize both accuracy and speed, achieving notable improvements over single-LLM baselines. Empirical results on a 702-error-code MBPP-derived dataset show a $67.80\%$ correction success with an average runtime of $99.8$ s and 196 E-RL-enabled corrections, highlighting the practical potential of adaptive LLM orchestration in code learning. The work suggests future directions in dynamic weight updating, larger code bases, and cross-language code understanding to broaden applicability.
Abstract
Online question-and-answer (Q\&A) systems based on the Large Language Model (LLM) have progressively diverged from recreational to professional use. This paper proposed a Multi-Agent framework with environmentally reinforcement learning (E-RL) for code correction called Code Learning (Co-Learning) community, assisting beginners to correct code errors independently. It evaluates the performance of multiple LLMs from an original dataset with 702 error codes, uses it as a reward or punishment criterion for E-RL; Analyzes input error codes by the current agent; selects the appropriate LLM-based agent to achieve optimal error correction accuracy and reduce correction time. Experiment results showed that 3\% improvement in Precision score and 15\% improvement in time cost as compared with no E-RL method respectively. Our source code is available at: https://github.com/yuqian2003/Co_Learning
