NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
Hossain Shaikh Saadi, Faria Alam, Mario Sanz-Guerrero, Minh Duc Bui, Manuel Mager, Katharina von der Wense
TL;DR
The paper tackles Bangla instruction-to-Python code generation by introducing a two-agent pipeline where a code-generation agent proposes solutions and a debugger agent refines failures using error traces and unit tests. It leverages external and generated unit tests to broaden coverage and demonstrates substantial gains from test-driven feedback, achieving a top Pass@1 of 95.4% on Codabench. The study systematically analyzes overfitting risks, external data impact, generated test cases, and translation effects, highlighting practical improvements for code synthesis in an underserved language. The findings underscore the value of structured, test-driven refinement for improving functional correctness in language-diverse program synthesis with real-world applicability.
Abstract
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input instruction. The candidate program is then executed against the provided unit tests (pytest-style, assert-based). Only the failing cases are forwarded to a debugger agent, which reruns the tests, extracts error traces, and, conditioning on the error messages, the current program, and the relevant test cases, generates a revised solution. Using this approach, our submission achieved first place in the shared task with a $Pass@1$ score of 95.4. We also make our code public.
