Table of Contents
Fetching ...

Collaborative Agents for Automated Program Repair in Ruby

Nikta Akbarpour, Mahdieh Sadat Benis, Fatemeh Hendijani Fard, Ali Ouni, Mohamed Aymen Saied

TL;DR

The paper addresses the need for efficient automated program repair in underrepresented languages by introducing RAMP, a Ruby-focused, multi-agent APR framework that iteratively refines fixes through feedback-driven collaboration among agents. RAMP formalizes APR as a loop with problem context, generated tests, candidate repairs, and execution feedback, enabling targeted improvements without large multilingual repair databases or fine-tuning. Key contributions include a formal objective with an iterative five-step workflow, ablation studies highlighting the importance of test generation and reflection, and open-source replication materials; results show state-of-the-art performance on the Ruby subset of XCodeEval with a pass@1 of $67.0\%$, converging within five iterations. The framework demonstrates strong practicality, achieving robust repair across common error types (WRONG_ANSWER, COMPILATION_ERROR, RUNTIME_ERROR) and offering insights into multi-agent collaboration for debugging in under-studied languages. Overall, RAMP extends LLM-based debugging toward Ruby, offering a scalable, efficient path for automated repair in real-world web development contexts.

Abstract

Automated Program Repair (APR) has advanced rapidly with Large Language Models (LLMs), but most existing methods remain computationally expensive, and focused on a small set of languages. Ruby, despite its widespread use in web development and the persistent challenges faced by its developers, has received little attention in APR research. In this paper, we introduce RAMP, a novel lightweight framework that formulates program repair as a feedback-driven, iterative process for Ruby. RAMP employs a team of collaborative agents that generate targeted tests, reflect on errors, and refine candidate fixes until a correct solution is found. Unlike prior approaches, RAMP is designed to avoid reliance on large multilingual repair databases or costly fine-tuning, instead operating directly on Ruby through lightweight prompting and test-driven feedback. Evaluation on the XCodeEval benchmark shows that RAMP achieves a pass@1 of 67% on Ruby, outper-forming prior approaches. RAMP converges quickly within five iterations, and ablation studies confirm that test generation and self-reflection are key drivers of its performance. Further analysis shows that RAMP is particularly effective at repairing wrong answers, compilation errors, and runtime errors. Our approach provides new insights into multi-agent repair strategies, and establishes a foundation for extending LLM-based debugging tools to under-studied languages.

Collaborative Agents for Automated Program Repair in Ruby

TL;DR

The paper addresses the need for efficient automated program repair in underrepresented languages by introducing RAMP, a Ruby-focused, multi-agent APR framework that iteratively refines fixes through feedback-driven collaboration among agents. RAMP formalizes APR as a loop with problem context, generated tests, candidate repairs, and execution feedback, enabling targeted improvements without large multilingual repair databases or fine-tuning. Key contributions include a formal objective with an iterative five-step workflow, ablation studies highlighting the importance of test generation and reflection, and open-source replication materials; results show state-of-the-art performance on the Ruby subset of XCodeEval with a pass@1 of , converging within five iterations. The framework demonstrates strong practicality, achieving robust repair across common error types (WRONG_ANSWER, COMPILATION_ERROR, RUNTIME_ERROR) and offering insights into multi-agent collaboration for debugging in under-studied languages. Overall, RAMP extends LLM-based debugging toward Ruby, offering a scalable, efficient path for automated repair in real-world web development contexts.

Abstract

Automated Program Repair (APR) has advanced rapidly with Large Language Models (LLMs), but most existing methods remain computationally expensive, and focused on a small set of languages. Ruby, despite its widespread use in web development and the persistent challenges faced by its developers, has received little attention in APR research. In this paper, we introduce RAMP, a novel lightweight framework that formulates program repair as a feedback-driven, iterative process for Ruby. RAMP employs a team of collaborative agents that generate targeted tests, reflect on errors, and refine candidate fixes until a correct solution is found. Unlike prior approaches, RAMP is designed to avoid reliance on large multilingual repair databases or costly fine-tuning, instead operating directly on Ruby through lightweight prompting and test-driven feedback. Evaluation on the XCodeEval benchmark shows that RAMP achieves a pass@1 of 67% on Ruby, outper-forming prior approaches. RAMP converges quickly within five iterations, and ablation studies confirm that test generation and self-reflection are key drivers of its performance. Further analysis shows that RAMP is particularly effective at repairing wrong answers, compilation errors, and runtime errors. Our approach provides new insights into multi-agent repair strategies, and establishes a foundation for extending LLM-based debugging tools to under-studied languages.

Paper Structure

This paper contains 26 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of the RAMP framework. denotes benchmark-provided inputs (problem description, I/O format, sample I/O, limits, and buggy code); denotes agents. Numbered stages: (1) Feedback Integrator produces a natural language self-reflection; (2) Test Designer generates the public test cases $T_g$; (3) Programmer generates full repaired code candidates; (4) Test Executor runs candidates on $T_g$ and returns results; (5) candidates that either pass $T_g$ or exhaust the iteration budget are validated on hidden tests $T_h$, and success requires passing all $T_h$ (we report pass@1). If a candidate fails$T_g$, the process continues iteratively, repeating (1), (3), and (4) with updated reflection until $T_g$ passes or the iteration budget $K$ is reached. Arrow colors show data flow and match the corresponding agent colors; highlighted text blocks use the same color to indicate which agent produced that output.
  • Figure 2: Left: Pass@1 of RAMP, LANTERN, and ChatRepair over iterations. Right: Distribution of solved and unsolved problems after applying RAMP and LANTERN in different difficulty ranges.
  • Figure 3: Left: Distribution of solved and unsolved tasks across eleven repair iterations. The green region indicates the number of tasks that passed all unit tests, while the red region shows tasks that remained unsolved. Right: Cumulative pass@1 across iterations.
  • Figure 4: Left: Solved and unsolved problems in RAMP across difficulty ranges. The blue line shows the number of questions in each difficulty range. Right: Bug execution outcome before and after RAMP.
  • Figure 5: Left: Percentage of solved questions for each tag. The blue line shows the number of problems in each tag. Right: Pass@1 for each bug execution outcome.
  • ...and 1 more figures