Table of Contents
Fetching ...

ECO: An LLM-Driven Efficient Code Optimizer for Warehouse Scale Computers

Hannah Lin, Martin Maas, Maximilian Roquemore, Arman Hasanzadeh, Fred Lewis, Yusuf Simonson, Tzu-Wei Yang, Amir Yazdanbakhsh, Deniz Altinbüken, Florin Papa, Maggie Nolan Edmonds, Aditya Patil, Don Schwarz, Satish Chandra, Chris Kennelly, Milad Hashemi, Parthasarathy Ranganathan

TL;DR

ECO tackles the challenge of optimizing software performance at warehouse scale by mining decades of production commits to build a dictionary of performance anti-patterns and associated optimizations. It localizes opportunities via embedding-based code similarity and continuous profiling, then uses a fine-tuned LLM to generate and apply edits, followed by automated verification and production monitoring. Deployed in Google's hyperscale fleet, ECO has produced >25k lines changed across >6.4k commits with production success >99.5% and quarterly normalized-core savings exceeding 500k cores, demonstrating practical, scalable automated optimization. The work advances automated, interpretable performance optimization for real-world, large-scale software ecosystems and points to substantial efficiency gains for data-center compute and energy use.

Abstract

With the end of Moore's Law, optimizing code for performance has become paramount for meeting ever-increasing compute demands, particularly in hyperscale data centers where even small efficiency gains translate to significant resource and energy savings. Traditionally, this process requires significant programmer effort to identify optimization opportunities, modify the code to implement the optimization, and carefully deploy and measure the optimization's impact. Despite a significant amount of work on automating program edits and promising results in small-scale settings, such performance optimizations have remained elusive in large real-world production environments, due to the scale, high degree of complexity, and reliability required. This paper introduces ECO (Efficient Code Optimizer), a system that automatically refactors source code to improve performance at scale. To achieve these performance gains, ECO searches through historical commits at scale to create a dictionary of performance anti-patterns that these commits addressed. These anti-patterns are used to search for similar patterns in a code base of billions of lines of code, pinpointing other code segments with similar potential optimization opportunities. Using a fine-tuned LLM, ECO then automatically refactors the code to generate and apply similar edits. Next, ECO verifies the transformed code, submits it for code review, and measures the impact of the optimization in production. Currently deployed on Google's hyperscale production fleet, this system has driven >25k changed lines of production code, across over 6.4k submitted commits, with a >99.5% production success rate. Over the past year, ECO has consistently resulted in significant performance savings every quarter. On average, the savings produced per quarter are equivalent to over 500k normalized CPU cores.

ECO: An LLM-Driven Efficient Code Optimizer for Warehouse Scale Computers

TL;DR

ECO tackles the challenge of optimizing software performance at warehouse scale by mining decades of production commits to build a dictionary of performance anti-patterns and associated optimizations. It localizes opportunities via embedding-based code similarity and continuous profiling, then uses a fine-tuned LLM to generate and apply edits, followed by automated verification and production monitoring. Deployed in Google's hyperscale fleet, ECO has produced >25k lines changed across >6.4k commits with production success >99.5% and quarterly normalized-core savings exceeding 500k cores, demonstrating practical, scalable automated optimization. The work advances automated, interpretable performance optimization for real-world, large-scale software ecosystems and points to substantial efficiency gains for data-center compute and energy use.

Abstract

With the end of Moore's Law, optimizing code for performance has become paramount for meeting ever-increasing compute demands, particularly in hyperscale data centers where even small efficiency gains translate to significant resource and energy savings. Traditionally, this process requires significant programmer effort to identify optimization opportunities, modify the code to implement the optimization, and carefully deploy and measure the optimization's impact. Despite a significant amount of work on automating program edits and promising results in small-scale settings, such performance optimizations have remained elusive in large real-world production environments, due to the scale, high degree of complexity, and reliability required. This paper introduces ECO (Efficient Code Optimizer), a system that automatically refactors source code to improve performance at scale. To achieve these performance gains, ECO searches through historical commits at scale to create a dictionary of performance anti-patterns that these commits addressed. These anti-patterns are used to search for similar patterns in a code base of billions of lines of code, pinpointing other code segments with similar potential optimization opportunities. Using a fine-tuned LLM, ECO then automatically refactors the code to generate and apply similar edits. Next, ECO verifies the transformed code, submits it for code review, and measures the impact of the optimization in production. Currently deployed on Google's hyperscale production fleet, this system has driven >25k changed lines of production code, across over 6.4k submitted commits, with a >99.5% production success rate. Over the past year, ECO has consistently resulted in significant performance savings every quarter. On average, the savings produced per quarter are equivalent to over 500k normalized CPU cores.

Paper Structure

This paper contains 31 sections, 1 equation, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: A high-level overview of the Efficient Code Optimizer (ECO) system optimizing a redundant map operation. 1. A dataset of performance optimizations stores before/after samples of code transformations optimizing a specific code anti-pattern (Section \ref{['sec:anti-patterns']}). 2. A sample anti-pattern is used to query the code repository for similar code snippets (Section \ref{['sec:localization:code-sim-embedding']}). 3. The retrieved code snippets are ranked according to semantic similarity to the query (Section \ref{['sec:localization:code-sim-ranking']}). 4. The highest ranking code snippets are optimized by applying a code transformation similar to the historical code transformation in 1 (Section \ref{['sec:edit-gen']}).
  • Figure 2: Three example categories of performance optimizations supported by ECO.
  • Figure 3: An end-to-end example of ECO deploying a change.
  • Figure 4: The approach and processing pipeline used to mine performance anti-patterns from our repository's commit history and a number of curated additional data sources.
  • Figure 5: Sample of an entry within the performance-annotated functions dataset.
  • ...and 9 more figures