Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

Mingda Li; Abhijit Mishra; Utkarsh Mujumdar

Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

Mingda Li, Abhijit Mishra, Utkarsh Mujumdar

TL;DR

This work examines biases in multilingual code generation by LLMs and introduces a projection-based zero-shot cross-lingual transfer using the LASER multilingual encoder to map non-English prompts into the LLM token space. A lightweight, English-only training regime on the MBPP dataset trains a projector that aligns LASER embeddings with the LLM via a mapping $H_{llm} = W_{llm} \\cdot H_{laser} + b_{llm}$ and a mean-squared-error loss $MSE = \\frac{1}{N} \\sum_{i=1}^{N} \\| \\hat{H}_{llm}^i - H_{llm}^i \\|^2$, enabling effective zero-shot inference for multilingual prompts while avoiding costly multilingual data collection. The approach yields substantial improvements in TotalER, LER, SER, and ATPR across languages compared to direct prompting and traditional baselines, with high Code Completion Rates, and is demonstrated on a quality-checked translated MBPP benchmark. The method is lightweight, scalable, and compatible with multiple open-source LLMs, and the authors publicly release code and multilingual evaluation data. Future work will broaden language coverage, explore other programming languages, and incorporate denoising or additional objectives to further reduce hallucinations.

Abstract

The use of Large Language Models (LLMs) for program code generation has gained substantial attention, but their biases and limitations with non-English prompts challenge global inclusivity. This paper investigates the complexities of multilingual prompt-based code generation. Our evaluations of LLMs, including CODELLAMA and CODEGEMMA, reveal significant disparities in code quality for non-English prompts; we also demonstrate the inadequacy of simple approaches like prompt translation, bootstrapped data augmentation, and fine-tuning. To address this, we propose a zero-shot cross-lingual approach using a neural projection technique, integrating a cross-lingual encoder like LASER to map multilingual embeddings from it into the LLM's token space. This method requires training only on English data and scales effectively to other languages. Results on a translated and quality-checked MBPP dataset show substantial improvements in code quality. This research promotes a more inclusive code generation landscape by empowering LLMs with multilingual capabilities to support the diverse linguistic spectrum in programming.

Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

TL;DR

and a mean-squared-error loss

, enabling effective zero-shot inference for multilingual prompts while avoiding costly multilingual data collection. The approach yields substantial improvements in TotalER, LER, SER, and ATPR across languages compared to direct prompting and traditional baselines, with high Code Completion Rates, and is demonstrated on a quality-checked translated MBPP benchmark. The method is lightweight, scalable, and compatible with multiple open-source LLMs, and the authors publicly release code and multilingual evaluation data. Future work will broaden language coverage, explore other programming languages, and incorporate denoising or additional objectives to further reduce hallucinations.

Abstract

Paper Structure (18 sections, 2 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Experimental Setup
Evaluation Dataset
Models Used for Evaluation
Inference Pipeline and Evaluation Metrics
Issues with Trivial Baselines
Baseline 1. Original Prompt
Chain-of-Thought with Back-Translation
Bootstrapping Multilingual Data and Fine Tuning
Our Approach: Projection-Based Zero-Shot Transfer
Results and Discussions
Total Error Rate (TotalER)
Logical Error Rate (LER)
Syntax Error Rate (SER)
...and 3 more sections

Figures (4)

Figure 1: Disparity in output code generated by CodeLLaMa-Instruct modelroziere2023code with 7B parameters for the same problem statement given in multiple languages
Figure 2: Baselines with direct prompting, Chain of Thought (CoT) and fine-tuning with bootstrapped data
Figure 3: Our proposed approach based on cross lingual encoder and projector training and zero shot inference
Figure 5: Code Completion Rate (CCR) for Models and Languages, with LP represented by perfect polygons, always above 90%

Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

TL;DR

Abstract

Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

Authors

TL;DR

Abstract

Table of Contents

Figures (4)