Collaborative Agents for Automated Program Repair in Ruby
Nikta Akbarpour, Mahdieh Sadat Benis, Fatemeh Hendijani Fard, Ali Ouni, Mohamed Aymen Saied
TL;DR
The paper addresses the need for efficient automated program repair in underrepresented languages by introducing RAMP, a Ruby-focused, multi-agent APR framework that iteratively refines fixes through feedback-driven collaboration among agents. RAMP formalizes APR as a loop with problem context, generated tests, candidate repairs, and execution feedback, enabling targeted improvements without large multilingual repair databases or fine-tuning. Key contributions include a formal objective with an iterative five-step workflow, ablation studies highlighting the importance of test generation and reflection, and open-source replication materials; results show state-of-the-art performance on the Ruby subset of XCodeEval with a pass@1 of $67.0\%$, converging within five iterations. The framework demonstrates strong practicality, achieving robust repair across common error types (WRONG_ANSWER, COMPILATION_ERROR, RUNTIME_ERROR) and offering insights into multi-agent collaboration for debugging in under-studied languages. Overall, RAMP extends LLM-based debugging toward Ruby, offering a scalable, efficient path for automated repair in real-world web development contexts.
Abstract
Automated Program Repair (APR) has advanced rapidly with Large Language Models (LLMs), but most existing methods remain computationally expensive, and focused on a small set of languages. Ruby, despite its widespread use in web development and the persistent challenges faced by its developers, has received little attention in APR research. In this paper, we introduce RAMP, a novel lightweight framework that formulates program repair as a feedback-driven, iterative process for Ruby. RAMP employs a team of collaborative agents that generate targeted tests, reflect on errors, and refine candidate fixes until a correct solution is found. Unlike prior approaches, RAMP is designed to avoid reliance on large multilingual repair databases or costly fine-tuning, instead operating directly on Ruby through lightweight prompting and test-driven feedback. Evaluation on the XCodeEval benchmark shows that RAMP achieves a pass@1 of 67% on Ruby, outper-forming prior approaches. RAMP converges quickly within five iterations, and ablation studies confirm that test generation and self-reflection are key drivers of its performance. Further analysis shows that RAMP is particularly effective at repairing wrong answers, compilation errors, and runtime errors. Our approach provides new insights into multi-agent repair strategies, and establishes a foundation for extending LLM-based debugging tools to under-studied languages.
