Table of Contents
Fetching ...

GiFT: Gibbs Fine-Tuning for Code Generation

Haochen Li, Wanjin Feng, Xin Zhou, Zhiqi Shen

TL;DR

GiFT introduces Gibbs Fine-Tuning, a self-training framework for code generation that samples training data from the marginal joint space of descriptions and codes by iteratively translating between descriptions and code implementations. The method theoretically reduces bias inherent in conditional sampling and uses perplexity-guided data selection to address long-tail code distributions, enabling more diverse and representative self-generated data. Empirical results across four datasets and two LLMs show GiFT consistently outperforms traditional self-training (RFT) and RFT with rewriting, with especially strong gains on challenging datasets like APPS+. Additional analyses cover data selection dynamics, distributional diversity, and a distillation scenario demonstrating GiFT’s broader applicability. The work suggests that marginal-distribution sampling, coupled with controlled code selection, can meaningfully improve program synthesis without requiring stronger external models.

Abstract

Training Large Language Models (LLMs) with synthetic data is a prevalent practice in code generation. A key approach is self-training, where LLMs are iteratively trained on self-generated correct code snippets. In this case, the self-generated codes are drawn from a conditional distribution, conditioned on a specific seed description. However, the seed description is not the only valid representation that aligns with its intended meaning. With all valid descriptions and codes forming a joint space, codes drawn from the conditional distribution would lead to an underrepresentation of the full description-code space. As such, we propose Gibbs Fine-Tuning (GiFT), a novel self-training method inspired by Gibbs sampling. GiFT allows self-generated data to be drawn from the marginal distribution of the joint space, thereby mitigating the biases inherent in conditional sampling. We provide a theoretical analysis demonstrating the potential benefits of fine-tuning LLMs with code derived from the marginal distribution. Furthermore, we propose a perplexity-based code selection method to mitigate the imbalanced long-tail distribution of the self-generated codes. Empirical evaluation of two LLMs across four datasets demonstrates that GiFT achieves superior performance, particularly on more challenging benchmarks. Source code is available at https://github.com/Alex-HaochenLi/GiFT.

GiFT: Gibbs Fine-Tuning for Code Generation

TL;DR

GiFT introduces Gibbs Fine-Tuning, a self-training framework for code generation that samples training data from the marginal joint space of descriptions and codes by iteratively translating between descriptions and code implementations. The method theoretically reduces bias inherent in conditional sampling and uses perplexity-guided data selection to address long-tail code distributions, enabling more diverse and representative self-generated data. Empirical results across four datasets and two LLMs show GiFT consistently outperforms traditional self-training (RFT) and RFT with rewriting, with especially strong gains on challenging datasets like APPS+. Additional analyses cover data selection dynamics, distributional diversity, and a distillation scenario demonstrating GiFT’s broader applicability. The work suggests that marginal-distribution sampling, coupled with controlled code selection, can meaningfully improve program synthesis without requiring stronger external models.

Abstract

Training Large Language Models (LLMs) with synthetic data is a prevalent practice in code generation. A key approach is self-training, where LLMs are iteratively trained on self-generated correct code snippets. In this case, the self-generated codes are drawn from a conditional distribution, conditioned on a specific seed description. However, the seed description is not the only valid representation that aligns with its intended meaning. With all valid descriptions and codes forming a joint space, codes drawn from the conditional distribution would lead to an underrepresentation of the full description-code space. As such, we propose Gibbs Fine-Tuning (GiFT), a novel self-training method inspired by Gibbs sampling. GiFT allows self-generated data to be drawn from the marginal distribution of the joint space, thereby mitigating the biases inherent in conditional sampling. We provide a theoretical analysis demonstrating the potential benefits of fine-tuning LLMs with code derived from the marginal distribution. Furthermore, we propose a perplexity-based code selection method to mitigate the imbalanced long-tail distribution of the self-generated codes. Empirical evaluation of two LLMs across four datasets demonstrates that GiFT achieves superior performance, particularly on more challenging benchmarks. Source code is available at https://github.com/Alex-HaochenLi/GiFT.

Paper Structure

This paper contains 33 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: For the intention of $d_0$, the set of all valid descriptions and codes forms a space. The distribution gap between conditional distribution (Red) and marginal distribution (Blue) indicates the bias introduced when fine-tuned LLMs with codes conditional on $d_0$, as some codes are rarely sampled.
  • Figure 2: Overview of GiFT. For each description $d_i$ in the seed dataset, we first translate it between descriptions and codes iteratively to draw codes from the marginal distribution based on the intention of $d_i$. Then, we calculate the perplexity of each generated code and employ weighted random sampling to select codes with codes from the tail being more likely to be selected for fine-tuning. Finally, all selected codes are paired with $d_i$ for fine-tuning. The example shown in this figure is taken from MBPP-sanitized/6.
  • Figure 3: Pass@1 (%) of applying RFT, RFT+RD, and GiFT to Deepseek-Coder-6.7B and CodeLlama-7B on 4 code generation datasets. The x-axis represents the iteration number and the shaded area represents the standard deviation.
  • Figure 4: Pass@1 (%) of applying RFT+RD with descriptions from a single step of Gibbs sampling to Deepseek-Coder-6.7B on MBPP+ and CodeInsight.
  • Figure 5: Top: Pass@1 (%) of applying GiFT to Deepseek-Coder-6.7B on APPS+ (Introductory) and MBPP+ with $T=\pm 2$. Bottom: Perplexity distribution of self-generated codes at the 3rd iteration for APPS+ (Introductory) and MBPP+.
  • ...and 7 more figures