Teamwork makes the dream work: LLMs-Based Agents for GitHub README.MD Summarization
Duc S. H. Nguyen, Bach G. Truong, Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio
TL;DR
Metagente introduces a teacher-student, multi-agent framework where specialized LLMs collaboratively self-optimize prompts to summarize GitHub README.MD files. Using four agents (Extractor, Summarizer, Teacher, Prompt Creator) and ROUGE-guided feedback, the system achieves substantial gains over single-agent baselines such as GPT-4o, GitSum, and LLaMA-2, while remaining data-efficient. Experimental results on a README.MD dataset show statistically significant improvements in ROUGE metrics, including large relative gains when training data are limited, and demonstrate the benefits of parallel fine-tuning. The work highlights the practical potential of cooperative LLMs for SE tasks and lays groundwork for broader applications and future improvements in prompt-based agent orchestration.
Abstract
The proliferation of Large Language Models (LLMs) in recent years has realized many applications in various domains. Being trained with a huge of amount of data coming from various sources, LLMs can be deployed to solve different tasks, including those in Software Engineering (SE). Though they have been widely adopted, the potential of using LLMs cooperatively has not been thoroughly investigated. In this paper, we proposed Metagente as a novel approach to amplify the synergy of various LLMs. Metagente is a Multi-Agent framework based on a series of LLMs to self-optimize the system through evaluation, feedback, and cooperation among specialized agents. Such a framework creates an environment where multiple agents iteratively refine and optimize prompts from various perspectives. The results of these explorations are then reviewed and aggregated by a teacher agent. To study its performance, we evaluated Metagente with an SE task, i.e., summarization of README.MD files, and compared it with three well-established baselines, i.e., GitSum, LLaMA-2, and GPT-4o. The results show that our proposed approach works efficiently and effectively, consuming a small amount of data for fine-tuning but still getting a high accuracy, thus substantially outperforming the baselines. The performance gain compared to GitSum, the most relevant benchmark, ranges from 27.63% to 60.43%. More importantly, compared to using only one LLM, Metagente boots up the accuracy to multiple folds.
