Table of Contents
Fetching ...

QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Changxin Ke, Rui Zhang, Shuo Wang, Li Ding, Guangli Li, Yuanbo Wen, Shuoming Zhang, Ruiyuan Xu, Jin Qin, Jiaming Guo, Chenxi Wang, Ling Li, Qi Guo, Yunji Chen

TL;DR

QiMeng-MuPa tackles the critical problem of ensuring functional equivalence in sequential-to-parallel code translation under data scarcity. It introduces a mutual-supervised framework with a Translator and a Tester that iteratively generate and verify data through Co-verify and Co-evolve, producing functionally equivalent CUDA-C translations and high-quality unit tests. The approach yields strong improvements over state-of-the-art baselines on Pass@1, BLEU, and CodeBLEU, and creates a substantial verified CUDA-C dataset (over 10k parallel functions with tests). This framework advances practical HPC code parallelization by combining translation with formalized verification, reducing data requirements and enhancing reliability for real-world GPU programming tasks.

Abstract

The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translated code. In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. QiMeng-MuPa consists of two models, a Translator and a Tester. Through an iterative loop consisting of Co-verify and Co-evolve steps, the Translator and the Tester mutually generate data for each other and improve collectively. The Tester generates unit tests to verify and filter functionally equivalent translated code, thereby evolving the Translator, while the Translator generates translated code as augmented input to evolve the Tester. Experimental results demonstrate that QiMeng-MuPa significantly enhances the performance of the base models: when applied to Qwen2.5-Coder, it not only improves Pass@1 by up to 28.91% and boosts Tester performance by 68.90%, but also outperforms the previous state-of-the-art method CodeRosetta by 1.56 and 6.92 in BLEU and CodeBLEU scores, while achieving performance comparable to DeepSeek-R1 and GPT-4.1. Our code is available at https://github.com/kcxain/mupa.

QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

TL;DR

QiMeng-MuPa tackles the critical problem of ensuring functional equivalence in sequential-to-parallel code translation under data scarcity. It introduces a mutual-supervised framework with a Translator and a Tester that iteratively generate and verify data through Co-verify and Co-evolve, producing functionally equivalent CUDA-C translations and high-quality unit tests. The approach yields strong improvements over state-of-the-art baselines on Pass@1, BLEU, and CodeBLEU, and creates a substantial verified CUDA-C dataset (over 10k parallel functions with tests). This framework advances practical HPC code parallelization by combining translation with formalized verification, reducing data requirements and enhancing reliability for real-world GPU programming tasks.

Abstract

The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translated code. In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. QiMeng-MuPa consists of two models, a Translator and a Tester. Through an iterative loop consisting of Co-verify and Co-evolve steps, the Translator and the Tester mutually generate data for each other and improve collectively. The Tester generates unit tests to verify and filter functionally equivalent translated code, thereby evolving the Translator, while the Translator generates translated code as augmented input to evolve the Tester. Experimental results demonstrate that QiMeng-MuPa significantly enhances the performance of the base models: when applied to Qwen2.5-Coder, it not only improves Pass@1 by up to 28.91% and boosts Tester performance by 68.90%, but also outperforms the previous state-of-the-art method CodeRosetta by 1.56 and 6.92 in BLEU and CodeBLEU scores, while achieving performance comparable to DeepSeek-R1 and GPT-4.1. Our code is available at https://github.com/kcxain/mupa.

Paper Structure

This paper contains 45 sections, 10 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: The Overview of QiMeng-MuPa. The framework consists of two models: a Translator and a Tester, with two steps: (1) Co-verify: the translated code from the Translator and the corresponding unit tests from the Tester are jointly verified by running on CPU/GPU. If the results of the source and target programs are inconsistent or have compilation/runtime errors, the data is discarded. (2) Co-evolve: the verified parallel (source, target, unit tests) triplets from the Co-verify step are used to fine-tune both the Translator and the Tester via back-translation, improving their performance iteratively until convergence.
  • Figure 2: Evaluation of Code Translation across iterations of QiMeng-MuPa based on Qwen2.5-Coder.
  • Figure 3: Evaluation of Unit Test Generation Using the VT metric across iterations of QiMeng-MuPa based on Qwen2.5-Coder compared with GPT-4.1, GPT-4o and GPT-4.
  • Figure 4: Ablation Study.Left: Comparison of the performance of the Translator using Qwen2.5-Coder with non-filter and compile-filter back-translation. Middle: Translator performance on Pass@1 when freezing the Tester. Right: Tester performance on the VT metric when freezing the Translator.
  • Figure 5: Coverage statistics of training sets across rounds.
  • ...and 13 more figures

Theorems & Definitions (2)

  • Definition 1: Code Translation
  • Definition 2: Unit Test Generation