CodeChemist: Functional Knowledge Transfer for Low-Resource Code Generation via Test-Time Scaling
Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Aishan Liu, Xianglong Liu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, and Bin Shi
TL;DR
CodeChemist addresses the cross-language performance gap in CodeLLMs by transferring functional knowledge from high-resource programming languages to low-resource PLs at inference time. It constructs ground-truth functional test cases by executing high-resource PL code, then uses multi-temperature hedged sampling to generate diverse candidate solutions in the low-resource PL and selects the best by execution-based pass rates across the test cases. The method yields substantial improvements (e.g., up to $69.5\%$ relative gains on Lua) across benchmarks like MultiPL-E and Ag-LiveCodeBench-X, without retraining. The results demonstrate robust cross-language transfer and indicate practical viability, with a manageable time footprint and potential for integration with existing test-time scaling techniques.
Abstract
Code Large Language Models (CodeLLMs) are increasingly used in code generation tasks across a wide range of applications. However, their performance is often inconsistent across different programming languages (PLs), with low-resource PLs suffering the most due to limited training data. In this paper, we present CodeChemist, a novel and efficient framework for test-time scaling that enables functional knowledge transfer from high-resource to low-resource PLs using generated test cases. CodeChemist first generates and executes code in high-resource PLs to create test cases that encapsulate functional knowledge. It then uses multi-temperature hedged sampling to generate code snippets in the low-resource PL and selects the best one based on the pass rate of the test cases. Our extensive experiments show that CodeChemist outperforms existing test-time scaling approaches, boosting the performance of code generation for low-resource PLs without requiring any model retraining.
