Table of Contents
Fetching ...

Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models

Nirmal Joshua Kapu, Mihit Sreejith

TL;DR

DemoCraft tackles code generation from natural language under semantic ambiguity by introducing latent concept learning and a probabilistic demonstration selector. It uses three components—latent concept learning, task-concept probability calculation, and demonstration selection—to tailor demonstrations to a target task. On MBPP and HumanEval with SantaCoder, it yields about a $2\times$ gain in pass@k and around $3\times$ gains in correctness@k and similarity@k, outperforming semantic and random baselines. The approach demonstrates that task-specific concept tokens can substantially improve executable-code generation and may scale to larger models and broader domains.

Abstract

Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding taskspecific contexts. To address these issues, we propose a system called DemoCraft, which enhances code generation by leveraging in-context learning and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. We then test our system on two major datasets: MBPP and Humaneval. Our experimental results demonstrate that the proposed system achieves an approximate 2x increase in the pass@k metric compared to baseline models. Furthermore, we introduce two novel evaluation metrics: correctness@k and similarity@k. Our empirical studies indicate that our system attains nearly a 3x improvement in these metrics as well.

Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models

TL;DR

DemoCraft tackles code generation from natural language under semantic ambiguity by introducing latent concept learning and a probabilistic demonstration selector. It uses three components—latent concept learning, task-concept probability calculation, and demonstration selection—to tailor demonstrations to a target task. On MBPP and HumanEval with SantaCoder, it yields about a gain in pass@k and around gains in correctness@k and similarity@k, outperforming semantic and random baselines. The approach demonstrates that task-specific concept tokens can substantially improve executable-code generation and may scale to larger models and broader domains.

Abstract

Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding taskspecific contexts. To address these issues, we propose a system called DemoCraft, which enhances code generation by leveraging in-context learning and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. We then test our system on two major datasets: MBPP and Humaneval. Our experimental results demonstrate that the proposed system achieves an approximate 2x increase in the pass@k metric compared to baseline models. Furthermore, we introduce two novel evaluation metrics: correctness@k and similarity@k. Our empirical studies indicate that our system attains nearly a 3x improvement in these metrics as well.

Paper Structure

This paper contains 13 sections, 3 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Large Language Models struggling at Code Generation
  • Figure 2: Few Shot Learning Pipeline
  • Figure 3: Demonstration Selection with Latent Concept Learning
  • Figure 4: Latent Concept Learning Module
  • Figure 5: Task Concept Probability Calculation Module
  • ...and 6 more figures