Characterizing Multi-Hunk Patches: Divergence, Proximity, and LLM Repair Challenges
Noor Nashid, Daniel Ding, Keheliya Gallaba, Ahmed E. Hassan, Ali Mesbah
TL;DR
This work tackles the challenge of multi-hunk patches, where fixes span non-contiguous regions and semantic interdependencies complicate automated repair. It introduces two metrics—hunk divergence and spatial proximity—and a real-world benchmark, Hunk4J, derived from Defects4J, to quantify patch heterogeneity and dispersion. The Birch framework enables reproducible, prompt-driven evaluation of six LLMs across retrieval strategies and contextual scopes, revealing that repair success declines with higher divergence and dispersion, with no model solving the most dispersed Fragment class. The findings motivate divergence-aware repair strategies and demonstrate the need for retrieval-augmented, context-sensitive approaches to tackle complex, distributed code edits in practice.
Abstract
Multi-hunk bugs, where fixes span disjoint regions of code, are common in practice, yet remain underrepresented in automated repair. Existing techniques and benchmarks pre-dominantly target single-hunk scenarios, overlooking the added complexity of coordinating semantically related changes across the codebase. In this work, we characterize HUNK4J, a dataset of multi-hunk patches derived from 372 real-world defects. We propose hunk divergence, a metric that quantifies the variation among edits in a patch by capturing lexical, structural, and file-level differences, while incorporating the number of hunks involved. We further define spatial proximity, a classification that models how hunks are spatially distributed across the program hierarchy. Our empirical study spanning six LLMs reveals that model success rates decline with increased divergence and spatial dispersion. Notably, when using the LLM alone, no model succeeds in the most dispersed Fragment class. These findings highlight a critical gap in LLM capabilities and motivate divergence-aware repair strategies.
