Table of Contents
Fetching ...

Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation

K M Nafi Asib, Sourav Saha, Mohammed Moshiul Hoque

TL;DR

The paper tackles Bangla-to-Python code generation for low-resource Bangla by proposing a test-driven, feedback-guided framework that combines a translation step with QLoRA-based fine-tuning on open-weight LLMs. A translation-aware pipeline preserves semantics by embedding full test suites in prompts, while parameter-efficient fine-tuning and an iterative error-driven refinement loop boost correctness. Empirical results on the BLP Shared Task-2 dataset show substantial gains, achieving Pass@1 of 0.934 on a blind test and ranking second, demonstrating the viability of open-weight LLMs for NLP-assisted coding in Bangla. The work also highlights translation fidelity and data scarcity as key challenges and provides experimental scripts to support replication and extension by the community.

Abstract

Large Language Models (LLMs) have advanced the automated generation of code from natural language prompts. However, low-resource languages (LRLs) like Bangla remain underrepresented due to the limited availability of instruction-to-code datasets and evaluation benchmarks. To address this, the BLP Workshop at IJCNLP-AACL 2025 introduced a shared task on "Code Generation in Bangla". In this work, we propose a method that combines instruction prompting with a test-driven, feedback-guided iterative refinement process using a fine-tuned Qwen2.5-14B model. The model generates code from Bangla instructions, tests it against unit tests, and iteratively refines any failing outputs through three evaluation passes, using test feedback to guide each step. This approach helped our team "Retriv" to secure 2nd place in the shared task with a Pass@1 score of 0.934. The analysis highlights challenges in Bangla instruction understanding and Python code generation, emphasizing the need for targeted methods in LRLs. We made experimental scripts publicly available for the community.

Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation

TL;DR

The paper tackles Bangla-to-Python code generation for low-resource Bangla by proposing a test-driven, feedback-guided framework that combines a translation step with QLoRA-based fine-tuning on open-weight LLMs. A translation-aware pipeline preserves semantics by embedding full test suites in prompts, while parameter-efficient fine-tuning and an iterative error-driven refinement loop boost correctness. Empirical results on the BLP Shared Task-2 dataset show substantial gains, achieving Pass@1 of 0.934 on a blind test and ranking second, demonstrating the viability of open-weight LLMs for NLP-assisted coding in Bangla. The work also highlights translation fidelity and data scarcity as key challenges and provides experimental scripts to support replication and extension by the community.

Abstract

Large Language Models (LLMs) have advanced the automated generation of code from natural language prompts. However, low-resource languages (LRLs) like Bangla remain underrepresented due to the limited availability of instruction-to-code datasets and evaluation benchmarks. To address this, the BLP Workshop at IJCNLP-AACL 2025 introduced a shared task on "Code Generation in Bangla". In this work, we propose a method that combines instruction prompting with a test-driven, feedback-guided iterative refinement process using a fine-tuned Qwen2.5-14B model. The model generates code from Bangla instructions, tests it against unit tests, and iteratively refines any failing outputs through three evaluation passes, using test feedback to guide each step. This approach helped our team "Retriv" to secure 2nd place in the shared task with a Pass@1 score of 0.934. The analysis highlights challenges in Bangla instruction understanding and Python code generation, emphasizing the need for targeted methods in LRLs. We made experimental scripts publicly available for the community.

Paper Structure

This paper contains 24 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Overview of the proposed framework