DPO-F+: Aligning Code Repair Feedback with Developers' Preferences
Zihan Fang, Yifan Zhang, Yueke Zhang, Kevin Leach, Yu Huang
TL;DR
DPO-f+ addresses the gap where LLM-aided code repair yields outputs that are hard to interpret. It combines developer-profile rubric metrics, automated pairwise-preference data construction, and a reward-augmented Direct Preference Optimization objective with a margin signal to generate feedback that is both accurate and aligned with user needs. Empirical results on novice tasks and SWE-bench Lite show substantial improvements in preference accuracy, feedback accuracy (including Pass@k metrics), and overall feedback alignment compared with Baseline and standard DPO, demonstrating practical gains in code comprehension and human–AI teaming. The framework supports scalable evaluation and personalization across contexts, enabling more effective, collaborative code repair workflows in education and professional software development.
Abstract
Large Language Models (LLMs) are increasingly applied to software engineering tasks, especially code repair. However, developers often struggle to interpret model outputs, limiting effective human-AI teaming. Prior work largely optimizes repaired code while under-addressing the natural-language feedback that enables comprehension and iterative improvement. We present DPO-f+, a novel framework that aligns code-repair feedback with developer needs and profiles. It (1) formalizes developer-profiled, domain-specific metrics for feedback alignment; (2) automatically constructs pairwise preference datasets from code-repair tasks; (3) fine-tunes using Direct Preference Optimization (DPO) augmented with a lightweight margin signal; and (4) provides an automated feedback evaluation protocol. Empirically, DPO-f+ outperforms both the baseline and standard DPO on generated-code accuracy and overall feedback alignment. On novice programming tasks, DPO-f+ raises the top-1 pass rate by 5.71 percentage points (pp) over the baseline and by 3.30 pp over DPO. On the more challenging SWE-bench Lite benchmark, it increases the issue-resolution rate by 1.67 pp over DPO and by 4.67 pp over the baseline. It also achieves the largest improvement in feedback alignment, outperforming DPO and the baseline. By aligning feedback more closely with developer needs, DPO-f+ turns LLM-assisted repair from one-shot outputs into a collaborative sensemaking workflow, providing a practical approach to enhancing code comprehension and fostering more effective human-AI teaming in software engineering.
