What a diff makes: automating code migration with large language models

Katherine A. Rosenfeld; Cliff C. Kerr; Jessica Lundin

What a diff makes: automating code migration with large language models

Katherine A. Rosenfeld, Cliff C. Kerr, Jessica Lundin

TL;DR

This work tackles maintaining software compatibility during dependency semantic-version changes by leveraging large language models (LLMs) fed with diff-based representations of code changes. It introduces AIMigrate, a Python toolkit that constructs diffs between legacy and target library versions and prompts LLMs with pre-update code to generate post-update code, allowing in-context migration with minimal library-specific wiring. Across three diverse case studies (Typhoidsim, Parcels, LangChain BriefGPT), the approach achieves meaningful coverage—up to 65% in a single run and 80% with multiple runs, with 47% of changes produced perfectly—demonstrating that diffs can compress changes and improve LLM-based migration performance relative to code-only prompts. The results also reveal context- and model-dependent performance, highlighting that diff-informed prompts can outperform plain code in some scenarios while under certain conditions black-box prompts remain competitive. Limitations include manual file selection, large context windows for diffs, and the need for human verification; future work points to improved diff construction, scalability, and broader language support.

Abstract

Modern software programs are built on stacks that are often undergoing changes that introduce updates and improvements, but may also break any project that depends upon them. In this paper we explore the use of Large Language Models (LLMs) for code migration, specifically the problem of maintaining compatibility with a dependency as it undergoes major and minor semantic version changes. We demonstrate, using metrics such as test coverage and change comparisons, that contexts containing diffs can significantly improve performance against out of the box LLMs and, in some cases, perform better than using code. We provide a dataset to assist in further development of this problem area, as well as an open-source Python package, AIMigrate, that can be used to assist with migrating code bases. In a real-world migration of TYPHOIDSIM between STARSIM versions, AIMigrate correctly identified 65% of required changes in a single run, increasing to 80% with multiple runs, with 47% of changes generated perfectly.

What a diff makes: automating code migration with large language models

TL;DR

Abstract

What a diff makes: automating code migration with large language models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)