Table of Contents
Fetching ...

Curriculum Learning for Small Code Language Models

Marwa Naïr, Kamel Yamani, Lynda Said Lhadj, Riyadh Baghdadi

TL;DR

This work investigates curriculum learning for small decoder-only code language models, aiming to improve performance on complex code tasks. By generating a Python-focused dataset and proposing the Overall Metric $OM = \frac{CC + HD}{2}$ to classify difficulty, the authors compare three CL schedules and a baseline across token-level and line-level code completion as well as code execution. The hybrid curriculum schedule yields the most pronounced gains in code execution, while improvements in code completion are modest; these benefits extend to fine-tuned Code Llama 7B, indicating scalability. Overall, the study demonstrates that well-designed curriculum strategies can enhance code execution capabilities in small decoders and provides open resources to spur further research in code-oriented curriculum learning.

Abstract

Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these models. While prior research has suggested that curriculum learning does not necessarily help in improving the performance of language models, our results surprisingly show that this may not be the case for code language models. We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models on the task of code execution, while its effect on code completion is less significant. To explore the potential of curriculum learning, we train multiple GPT models with 1 million parameters each to predict the next token and evaluate them on code completion and execution tasks. Our contributions include proposing a novel code difficulty assessment metric by combining software code measures, investigating the effectiveness of Curriculum Learning for code language models, and introducing a Novel Curriculum Learning schedule that enhances the performance of small decoder-only language models in code execution tasks. The results of this paper open the door for more research on the use of curriculum learning for code language models.

Curriculum Learning for Small Code Language Models

TL;DR

This work investigates curriculum learning for small decoder-only code language models, aiming to improve performance on complex code tasks. By generating a Python-focused dataset and proposing the Overall Metric to classify difficulty, the authors compare three CL schedules and a baseline across token-level and line-level code completion as well as code execution. The hybrid curriculum schedule yields the most pronounced gains in code execution, while improvements in code completion are modest; these benefits extend to fine-tuned Code Llama 7B, indicating scalability. Overall, the study demonstrates that well-designed curriculum strategies can enhance code execution capabilities in small decoders and provides open resources to spur further research in code-oriented curriculum learning.

Abstract

Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these models. While prior research has suggested that curriculum learning does not necessarily help in improving the performance of language models, our results surprisingly show that this may not be the case for code language models. We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models on the task of code execution, while its effect on code completion is less significant. To explore the potential of curriculum learning, we train multiple GPT models with 1 million parameters each to predict the next token and evaluate them on code completion and execution tasks. Our contributions include proposing a novel code difficulty assessment metric by combining software code measures, investigating the effectiveness of Curriculum Learning for code language models, and introducing a Novel Curriculum Learning schedule that enhances the performance of small decoder-only language models in code execution tasks. The results of this paper open the door for more research on the use of curriculum learning for code language models.
Paper Structure (35 sections, 2 equations, 6 figures, 8 tables)

This paper contains 35 sections, 2 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of Our Approach : We begin by generating code snippets using TinyPy Generator. Next, we assess the difficulty of the generated code snippets using the Overall Metric we propose and categorize the data into three levels of difficulty: easy, medium, and hard. Our 1M parameters decoder-only language models are trained following various Curriculum Learning schedules. We then compare their performance to a 1M baseline model trained on all the data simultaneously, with all three levels shuffled.
  • Figure 2: Distribution of Overall Metric (OM) Scores for the Initial Set of Generated Snippets.
  • Figure 3: One code snippet example from each difficulty level (the examples are chosen arbitrarily). More examples are presented in \ref{['sec:examples']}.
  • Figure 4: Our three curriculum learning schedules. Sequential progresses from easy to hard snippets sequentially. Incremental starts with easy snippets, gradually adding harder ones. The Hybrid schedule starts with easy snippets, then adds a mix of the hardest easy snippets and medium snippets, and finally combines the hardest snippets from the easy and medium levels with hard snippets.
  • Figure 5: Experimental Evaluation on Three Code Tasks: Token-Level Completion, Line-Level Completion, and Code Execution
  • ...and 1 more figures