Table of Contents
Fetching ...

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Nam Huynh, Beiyu Lin

TL;DR

This survey addresses how Large Language Models enable automatic code generation from natural language, focusing on limitations, fine-tuning strategies, evaluation metrics, and applications. It presents a structured view from foundational LLM architectures and code-generation workflows to domain-specific tuning, feedback-driven improvements, and prompting techniques, supported by benchmarks such as HumanEval, CodeBLEU, and ICE-Score. The paper also reviews practical applications and tools (e.g., CodeLlama, GitHub Copilot, ToolGen) and discusses serious concerns around resource demands, errors, biases, and security, offering a roadmap for advancing reliable, efficient code-generation systems. Collectively, the work highlights that combining domain-focused fine-tuning, execution-based feedback, and advanced prompting can substantially boost code-generation performance while underscoring the need for rigorous evaluation and secure deployment in real-world development tasks.

Abstract

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate executable code. We begin with understanding LLMs' limitations and challenges in automated code generation. Subsequently, we review various fine-tuning techniques designed to enhance both the performance and adaptability of LLMs in code generation tasks. We then review the existing metrics and benchmarks for evaluations to assess model performance based on fine-tuning techniques. Finally, we explore the applications of LLMs (e.g. CodeLlama, GitHub Copilot, ToolGen) in code generation tasks to illustrate their roles and functionalities. This survey provides a comprehensive overview of LLMs for code generation, helps researchers in diverse fields better understand the current state-of-the-art technologies, and offers the potential of effectively leveraging LLMs for code generation tasks.

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

TL;DR

This survey addresses how Large Language Models enable automatic code generation from natural language, focusing on limitations, fine-tuning strategies, evaluation metrics, and applications. It presents a structured view from foundational LLM architectures and code-generation workflows to domain-specific tuning, feedback-driven improvements, and prompting techniques, supported by benchmarks such as HumanEval, CodeBLEU, and ICE-Score. The paper also reviews practical applications and tools (e.g., CodeLlama, GitHub Copilot, ToolGen) and discusses serious concerns around resource demands, errors, biases, and security, offering a roadmap for advancing reliable, efficient code-generation systems. Collectively, the work highlights that combining domain-focused fine-tuning, execution-based feedback, and advanced prompting can substantially boost code-generation performance while underscoring the need for rigorous evaluation and secure deployment in real-world development tasks.

Abstract

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate executable code. We begin with understanding LLMs' limitations and challenges in automated code generation. Subsequently, we review various fine-tuning techniques designed to enhance both the performance and adaptability of LLMs in code generation tasks. We then review the existing metrics and benchmarks for evaluations to assess model performance based on fine-tuning techniques. Finally, we explore the applications of LLMs (e.g. CodeLlama, GitHub Copilot, ToolGen) in code generation tasks to illustrate their roles and functionalities. This survey provides a comprehensive overview of LLMs for code generation, helps researchers in diverse fields better understand the current state-of-the-art technologies, and offers the potential of effectively leveraging LLMs for code generation tasks.

Paper Structure

This paper contains 21 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Example of using ChatGPT 4o to generate code for removing missing values.
  • Figure 2: Example of using GitHub Copilot generating code datacampcopilot
  • Figure 3: How transformer models work nvidia1
  • Figure 4: Notable released LLMs timeline
  • Figure 5: LLMs-Based Fine-tuning process
  • ...and 1 more figures