I Can't Share Code, but I need Translation -- An Empirical Study on Code Translation through Federated LLM
Jahnavi Kumar, Venkata Lakshmana Sasaank Janapati, Mokshith Reddy Tanguturi, Sridhar Chimalakonda
TL;DR
This paper addresses privacy-sensitive code translation by proposing a Federated LLM (FedLLM) framework that collaboratively fine-tunes a code-focused model without sharing source code. Using CodeLLaMA-7B with LoRA adapters, the study compares FedAvg and FLoRA aggregation for Java–C# translation on CodeXGLUE data, showing that FedLLM outperforms individual client models by at least 40–50% across CodeBLEU, BLEU, METEOR, and ROUGE, and can approach central-training performance. The results highlight the potential of privacy-preserving collaboration in software engineering tasks and suggest that FLoRA can match centralized models more closely than FedAvg, albeit with different convergence behaviors. The work lays a foundation for scalable, privacy-aware code translation and motivates future exploration of additional languages, larger FL scenarios, and defenses against corrupted updates.
Abstract
Owing to the rapid evolution of technologies and project requirements, organizations need to upgrade the code base in their software projects to a new version of the programming language or even translating to an entirely new one. However, code translation is resource-intensive and requires expertise in both the source and target languages. While researchers have made progress in automating translations between legacy and modern languages, recent work has increasingly turned to pre-trained Large Language Models (LLMs) to translate efficiently. Given the proprietary nature of code, organizations prefer fine-tuning LLMs locally rather than relying on external APIs. This is one of the first empirical studies that proposes a Federated LLM-based approach for code translation. The proposed approach enables clients to jointly train a code translator without sharing sensitive data. This study demonstrates that participants can collaboratively develop a FedLLM for efficient code translation (particularly C\# to Java and vice-versa) with superior results (more than 40\% improvement in CodeLLaMA's CodeBLEU score) compared to individual client models. Our findings indicate that FedLLM offers a collaborative approach to code translation and could serve as a promising direction for future research in this field.
