Table of Contents
Fetching ...

How is Google using AI for internal code migrations?

Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Satish Chandra, Siddharth Taneja, Celal Ziftci

TL;DR

The paper documents Google's experiential use of bespoke LLM-powered code migrations within a large-scale, monorepo environment. It presents a hybrid workflow that pairs LLM edits with deterministic AST-based techniques and a reusable migration toolkit to achieve repo-wide changes, demonstrated across int32-to-int64 ID migrations, JUnit3-to-JUnit4, and Joda Time migrations. Key findings include substantial time savings (ranging from $50\%$ to $89\%$ depending on the case) and high rates of AI-authored changes that can be landed with human review, underscoring the practical viability of enterprise-scale AI-assisted migrations. The work emphasizes a human-in-the-loop rollout, careful validation, and the need to balance custom models with generic capabilities to maximize return on investment and reliability in production code bases.

Abstract

In recent years, there has been a tremendous interest in using generative AI, and particularly large language models (LLMs) in software engineering; indeed there are now several commercially available tools, and many large companies also have created proprietary ML-based tools for their own software engineers. While the use of ML for common tasks such as code completion is available in commodity tools, there is a growing interest in application of LLMs for more bespoke purposes. One such purpose is code migration. This article is an experience report on using LLMs for code migrations at Google. It is not a research study, in the sense that we do not carry out comparisons against other approaches or evaluate research questions/hypotheses. Rather, we share our experiences in applying LLM-based code migration in an enterprise context across a range of migration cases, in the hope that other industry practitioners will find our insights useful. Many of these learnings apply to any application of ML in software engineering. We see evidence that the use of LLMs can reduce the time needed for migrations significantly, and can reduce barriers to get started and complete migration programs.

How is Google using AI for internal code migrations?

TL;DR

The paper documents Google's experiential use of bespoke LLM-powered code migrations within a large-scale, monorepo environment. It presents a hybrid workflow that pairs LLM edits with deterministic AST-based techniques and a reusable migration toolkit to achieve repo-wide changes, demonstrated across int32-to-int64 ID migrations, JUnit3-to-JUnit4, and Joda Time migrations. Key findings include substantial time savings (ranging from to depending on the case) and high rates of AI-authored changes that can be landed with human review, underscoring the practical viability of enterprise-scale AI-assisted migrations. The work emphasizes a human-in-the-loop rollout, careful validation, and the need to balance custom models with generic capabilities to maximize return on investment and reliability in production code bases.

Abstract

In recent years, there has been a tremendous interest in using generative AI, and particularly large language models (LLMs) in software engineering; indeed there are now several commercially available tools, and many large companies also have created proprietary ML-based tools for their own software engineers. While the use of ML for common tasks such as code completion is available in commodity tools, there is a growing interest in application of LLMs for more bespoke purposes. One such purpose is code migration. This article is an experience report on using LLMs for code migrations at Google. It is not a research study, in the sense that we do not carry out comparisons against other approaches or evaluate research questions/hypotheses. Rather, we share our experiences in applying LLM-based code migration in an enterprise context across a range of migration cases, in the hope that other industry practitioners will find our insights useful. Many of these learnings apply to any application of ML in software engineering. We see evidence that the use of LLMs can reduce the time needed for migrations significantly, and can reduce barriers to get started and complete migration programs.
Paper Structure (22 sections, 15 figures)

This paper contains 22 sections, 15 figures.

Figures (15)

  • Figure 1: Landed changelists of AI-powered migrations for the first 3 quarters for 2024.
  • Figure 2: Improving AI-based features in coding tools (e.g., in the IDE) with historical high quality data across tools and with usage data capturing user preferences and needs.
  • Figure 3: A demonstration of how a variety of AI-based features can work together to assist with coding in the IDE. top: code completion, middle: adjusting copy-pasted code to the context, bottom: code edits based on natural language instructions. See our blog for more details web:blog:aigoogle.
  • Figure 4: The high-level process to land an AI-authored change in the monorepo. We use LLMs extensively in code change creation, and partly in discovery and validation phases.
  • Figure 5: Example execution of the multi-stage code migration process.
  • ...and 10 more figures