Table of Contents
Fetching ...

SEAlign: Alignment Training for Software Engineering Agent

Kechi Zhang, Huangzhao Zhang, Ge Li, Jinliang You, Jia Li, Yunfei Zhao, Zhi Jin

TL;DR

SEAlign addresses the gap between post-training alignment of code models and real-world software engineering tasks by leveraging agentic trajectories and a two-stage alignment pipeline. It collects high-quality trajectories, constructs trajectory trees, scores decision nodes to identify critical actions, and applies SFT followed by fine-grained DPO on critical action pairs to align models with real-world workflows. Across SWE-Bench-Lite, SWE-Bench-Verified, and HumanEvalFix, SEAlign delivers state-of-the-art results with modest training data and demonstrates gains in both task performance and user experience, including automated app generation with improved usability. The approach also discusses generalization, practical overheads, and threats to validity, arguing SEAlign as a meaningful step toward scalable, fully automated software engineering powered by LLMs.

Abstract

Recent advances in code generation models have demonstrated impressive capabilities in automating software development tasks, yet these models still struggle in real-world software engineering scenarios. Although current training methods, particularly post-training, excel at solving competitive programming problems, they fail to adequately prepare models for the complexities of practical software development. This misalignment raises the critical question: Are existing alignment training methods well suited for real-world software engineering tasks? In this study, we identify this issue and propose SEAlign, a novel alignment framework designed to bridge the gap between code generation models and real-world software development tasks. SEAlign leverages the unique characteristics of software engineering processes, including high-quality workflow steps, to enhance model capabilities. Our framework further employs Monte Carlo Tree Search for fine-grained alignment in multi-step decision processes, followed by preference optimization on critical actions to ensure models meet real-world requirements. We evaluate SEAlign on three standard agentic benchmarks for real-world software engineering, including HumanEvalFix, SWE-Bench-Lite, and SWE-Bench-Verified. Experimental results demonstrate state-of-the-art performance with minimal training overhead. In addition, we develop an agent-based software development platform using SEAlign, which successfully automates the creation of several small applications. Human evaluations of these applications highlight significant improvements in both task performance and user experience. Our findings underscore the potential of SEAlign to accelerate the adoption of large code models in real-world software development. We believe that this research makes a meaningful step towards fully automated software engineering.

SEAlign: Alignment Training for Software Engineering Agent

TL;DR

SEAlign addresses the gap between post-training alignment of code models and real-world software engineering tasks by leveraging agentic trajectories and a two-stage alignment pipeline. It collects high-quality trajectories, constructs trajectory trees, scores decision nodes to identify critical actions, and applies SFT followed by fine-grained DPO on critical action pairs to align models with real-world workflows. Across SWE-Bench-Lite, SWE-Bench-Verified, and HumanEvalFix, SEAlign delivers state-of-the-art results with modest training data and demonstrates gains in both task performance and user experience, including automated app generation with improved usability. The approach also discusses generalization, practical overheads, and threats to validity, arguing SEAlign as a meaningful step toward scalable, fully automated software engineering powered by LLMs.

Abstract

Recent advances in code generation models have demonstrated impressive capabilities in automating software development tasks, yet these models still struggle in real-world software engineering scenarios. Although current training methods, particularly post-training, excel at solving competitive programming problems, they fail to adequately prepare models for the complexities of practical software development. This misalignment raises the critical question: Are existing alignment training methods well suited for real-world software engineering tasks? In this study, we identify this issue and propose SEAlign, a novel alignment framework designed to bridge the gap between code generation models and real-world software development tasks. SEAlign leverages the unique characteristics of software engineering processes, including high-quality workflow steps, to enhance model capabilities. Our framework further employs Monte Carlo Tree Search for fine-grained alignment in multi-step decision processes, followed by preference optimization on critical actions to ensure models meet real-world requirements. We evaluate SEAlign on three standard agentic benchmarks for real-world software engineering, including HumanEvalFix, SWE-Bench-Lite, and SWE-Bench-Verified. Experimental results demonstrate state-of-the-art performance with minimal training overhead. In addition, we develop an agent-based software development platform using SEAlign, which successfully automates the creation of several small applications. Human evaluations of these applications highlight significant improvements in both task performance and user experience. Our findings underscore the potential of SEAlign to accelerate the adoption of large code models in real-world software development. We believe that this research makes a meaningful step towards fully automated software engineering.

Paper Structure

This paper contains 37 sections, 5 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: Observed cases of misalignment in existing code models and agentic frameworks.
  • Figure 2: An overall pipeline of SEAlign (best viewed in color). Our framework involves four steps: ❶ collecting agentic trajectory dataset from real-world software engineering environments, namely SWE-Gym swegym, ❷ aggregating all sampled trajectories and constructing trajectory trees, ❸ scoring nodes within trajectory trees and extracting partial trajectory pairs with significant impact, and ❹ optimizing model preference with critical nodes. After these steps, we can finally obtain a well-aligned agentic LLM. Note that in step ❹, it is usual to SFT the LLM before DPO as a warmup for the instruction following. To facilitate viewing, we do not include the SFT part in the pipeline figure.
  • Figure 3: Case study on creating a to-do list web application with OpenHands agentic framework compared with Qwen2.5-Coder-Instruct-14B and SEAlign-14B.