Table of Contents
Fetching ...

Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs

Marina Sakharova, Abhinav Anand, Mira Mezini

TL;DR

The paper addresses improving code-generating LLM fine-tuning by integrating symbolic execution to augment reward-model data for RL and DPO approaches. It augments the APPS dataset with automatically generated test cases via symbolic execution (CrossHair plus MonkeyType) to achieve richer path coverage, then trains SE-enhanced critics and RL/DPO actors. Results show substantial gains for the critic models, indicating better reward signals, while actor gains are modest and DPO underperforms compared to RL baselines. This work demonstrates that symbolic execution can sharpen evaluation feedback and potentially enhance the robustness of code-generation systems in practical software tasks. The approach has implications for safer, more reliable code synthesis and suggests further exploration of critic-actor dynamics and broader language coverage.

Abstract

Code-generating Large Language Models (LLMs) have become essential tools in modern software development, enhancing productivity and accelerating development. This paper aims to investigate the fine-tuning of code-generating LLMs using Reinforcement Learning and Direct Preference Optimization, further improving their performance. To achieve this, we enhance the training data for the reward model with the help of symbolic execution techniques, ensuring more comprehensive and objective data. With symbolic execution, we create a custom dataset that better captures the nuances in code evaluation. Our reward models, fine-tuned on this dataset, demonstrate significant improvements over the baseline, CodeRL, in estimating the quality of generated code. Our code-generating LLMs, trained with the help of reward model feedback, achieve similar results compared to the CodeRL benchmark.

Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs

TL;DR

The paper addresses improving code-generating LLM fine-tuning by integrating symbolic execution to augment reward-model data for RL and DPO approaches. It augments the APPS dataset with automatically generated test cases via symbolic execution (CrossHair plus MonkeyType) to achieve richer path coverage, then trains SE-enhanced critics and RL/DPO actors. Results show substantial gains for the critic models, indicating better reward signals, while actor gains are modest and DPO underperforms compared to RL baselines. This work demonstrates that symbolic execution can sharpen evaluation feedback and potentially enhance the robustness of code-generation systems in practical software tasks. The approach has implications for safer, more reliable code synthesis and suggests further exploration of critic-actor dynamics and broader language coverage.

Abstract

Code-generating Large Language Models (LLMs) have become essential tools in modern software development, enhancing productivity and accelerating development. This paper aims to investigate the fine-tuning of code-generating LLMs using Reinforcement Learning and Direct Preference Optimization, further improving their performance. To achieve this, we enhance the training data for the reward model with the help of symbolic execution techniques, ensuring more comprehensive and objective data. With symbolic execution, we create a custom dataset that better captures the nuances in code evaluation. Our reward models, fine-tuned on this dataset, demonstrate significant improvements over the baseline, CodeRL, in estimating the quality of generated code. Our code-generating LLMs, trained with the help of reward model feedback, achieve similar results compared to the CodeRL benchmark.

Paper Structure

This paper contains 18 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Test case generation pipeline.
  • Figure 2: CodeRL training pipeline. Our pipeline extension is marked green.
  • Figure 3: The distribution of test case number in the original train set (left) and the modified train set (right).