$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding

Shuai Wang; Liang Ding; Li Shen; Yong Luo; Zheng He; Wei Yu; Dacheng Tao

$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding

Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, Dacheng Tao

TL;DR

This work proposes a simple and effective plug-and-play mechanism to improve the quality of one-pass code generation in LLMs and reduce the impact of output noise, and selects output noise induced by lame prompts based on the uncertainty of the prediction distribution from the standard prompt.

Abstract

Large language models (LLMs) have shown remarkable capabilities in code generation. However, the effects of hallucinations (e.g., output noise) make it particularly challenging for LLMs to generate high-quality code in one pass. In this work, we propose a simple and effective \textbf{u}ncertainty-aware \textbf{s}elective \textbf{c}ontrastive \textbf{d}ecoding ($\mathbb{USCD}$) mechanism to improve the quality of one-pass code generation in LLMs and reduce the impact of output noise. To be specific, we first elaborately designed a negative prompt (namely lame prompt) to output noise by removing input-output examples from the standard few-shot prompt. Our preliminary study shows that the Jensen-Shannon divergence (JS divergence) between token distribution uncertainty and the output noise is relatively low (approximately $0.25$), indicating their high relevance. Then, we selectively eliminate output noise induced by lame prompts based on the uncertainty of the prediction distribution from the standard prompt. Notably, our proposed plug-and-play mechanism is an inference-only method, enjoying appealing flexibility. Extensive experiments on widely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs (i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b), demonstrate that our proposed USCD significantly improves one-pass code generation, with an average \textit{pass@$1$} scores increase of 16.59\%. We will release code and data on GitHub.

$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding

TL;DR

Abstract

) mechanism to improve the quality of one-pass code generation in LLMs and reduce the impact of output noise. To be specific, we first elaborately designed a negative prompt (namely lame prompt) to output noise by removing input-output examples from the standard few-shot prompt. Our preliminary study shows that the Jensen-Shannon divergence (JS divergence) between token distribution uncertainty and the output noise is relatively low (approximately

), indicating their high relevance. Then, we selectively eliminate output noise induced by lame prompts based on the uncertainty of the prediction distribution from the standard prompt. Notably, our proposed plug-and-play mechanism is an inference-only method, enjoying appealing flexibility. Extensive experiments on widely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs (i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b), demonstrate that our proposed USCD significantly improves one-pass code generation, with an average \textit{pass@

} scores increase of 16.59\%. We will release code and data on GitHub.

Paper Structure (17 sections, 5 equations, 8 figures, 8 tables)

This paper contains 17 sections, 5 equations, 8 figures, 8 tables.

Introduction
Methodology
Overview
Construction of the Lame Prompt
Uncertainly-Aware Selective Contrastive Decoding
Experiments
Experimental Setup
Ablation Studies
Main Results
Related Work
Code Generation of LLMs
Contrastive Decoding
Discussion
Conclusion
The Process of Constructing Lame Prompt
...and 2 more sections

Figures (8)

Figure 1: The Jensen-Shannon divergence (JS divergence) between token distribution uncertainty and output noise for Incoder-6b fried2023incoder. We randomly selected a standard prompt that generated incorrect code with Incoder-6b, i.e., $501 \sim 600$ tokens in HumanEval/163. We calculated the JS divergence between the token distribution of the lame prompt output and the token distribution with (blue) and without (red) the USCD mechanism. We can clearly see that for the Incoder-6b, the JS divergence between token distribution uncertainty and output noise is approximately $0.25$ without using the USCD mechanism (red) and approximately $0.65$ with the USCD mechanism (blue).
Figure 2: Illustration of our uncertainty-aware selective contrast decoding (USCD) mechanism for improving code generation of LLMs.
Figure 3: Comparison of the performance between models using the USCD mechanism and models directly using standard prompts on the HumanEval benchmark chen2021evaluating. During the experiment, we use a temperature of $0.1$ and top-$p$=$0.95$. We can see that USCD mechanism significantly improves the performance of code specialized models, e.g., CodeLlama-7b rozière2023code, StarCoder li2023starcoder, WizardCoder luo2023wizardcoder, Incoder-6b fried2023incoder and general models, e.g., Llama2-7b touvron2023llama alike.
Figure 4: Performance comparison of the used LLMs, e.g., Llama2-7b touvron2023llama, CodeLlama-7b rozière2023code, and StarCode li2023starcoder, using standard prompt and lame prompt on the HumanEval benchmark chen2021evaluating. We can clearly see that the performance of LLMs using a lame prompt is significantly lower compared to using a standard prompt.
Figure 5: Pass@$1$ scores of CodeLlama-7b touvron2023llama under different values of $\vartheta$.
...and 3 more figures

$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding

TL;DR

Abstract

$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (8)