GiFT: Gibbs Fine-Tuning for Code Generation
Haochen Li, Wanjin Feng, Xin Zhou, Zhiqi Shen
TL;DR
GiFT introduces Gibbs Fine-Tuning, a self-training framework for code generation that samples training data from the marginal joint space of descriptions and codes by iteratively translating between descriptions and code implementations. The method theoretically reduces bias inherent in conditional sampling and uses perplexity-guided data selection to address long-tail code distributions, enabling more diverse and representative self-generated data. Empirical results across four datasets and two LLMs show GiFT consistently outperforms traditional self-training (RFT) and RFT with rewriting, with especially strong gains on challenging datasets like APPS+. Additional analyses cover data selection dynamics, distributional diversity, and a distillation scenario demonstrating GiFT’s broader applicability. The work suggests that marginal-distribution sampling, coupled with controlled code selection, can meaningfully improve program synthesis without requiring stronger external models.
Abstract
Training Large Language Models (LLMs) with synthetic data is a prevalent practice in code generation. A key approach is self-training, where LLMs are iteratively trained on self-generated correct code snippets. In this case, the self-generated codes are drawn from a conditional distribution, conditioned on a specific seed description. However, the seed description is not the only valid representation that aligns with its intended meaning. With all valid descriptions and codes forming a joint space, codes drawn from the conditional distribution would lead to an underrepresentation of the full description-code space. As such, we propose Gibbs Fine-Tuning (GiFT), a novel self-training method inspired by Gibbs sampling. GiFT allows self-generated data to be drawn from the marginal distribution of the joint space, thereby mitigating the biases inherent in conditional sampling. We provide a theoretical analysis demonstrating the potential benefits of fine-tuning LLMs with code derived from the marginal distribution. Furthermore, we propose a perplexity-based code selection method to mitigate the imbalanced long-tail distribution of the self-generated codes. Empirical evaluation of two LLMs across four datasets demonstrates that GiFT achieves superior performance, particularly on more challenging benchmarks. Source code is available at https://github.com/Alex-HaochenLi/GiFT.
