Table of Contents
Fetching ...

Stale Profile Matching

Amir Ayupov, Maksim Panchenko, Sergey Pupyrev

TL;DR

This work tackles profile staleness in profile-guided optimization by introducing a practical two‑stage approach—matching and inference—to reuse profiles collected on binaries built several revisions behind the release. Implemented in the BOLT post‑link optimizer, the method uses a hierarchical, multi‑level block hashing scheme to match basic blocks and a minimum‑cost flow framework to infer consistent block and branch counts, preserving as much of the original guidance as possible. Across large open‑source binaries and production Meta workloads, it recovers a substantial portion ($0.6{-}0.8$) of the maximum potential BOLT speedup when profiles are stale, including a clang example achieving $0.78$ of PGO benefits with over $90\%$ stale data. The approach reduces the need for re‑profiling after hotfixes, broadens the practical adoption of PGO, and is general enough to adapt to other PGO systems beyond BOLT.

Abstract

Profile-guided optimizations rely on profile data for directing compilers to generate optimized code. To achieve the maximum performance boost, profile data needs to be collected on the same version of the binary that is being optimized. In practice however, there is typically a gap between the profile collection and the release, which makes a portion of the profile invalid for optimizations. This phenomenon is known as profile staleness, and it is a serious practical problem for data-center workloads both for compilers and binary optimizers. In this paper we thoroughly study the staleness problem and propose the first practical solution for utilizing profiles collected on binaries built from several revisions behind the release. Our algorithm is developed and implemented in a mainstream open-source post-link optimizer, BOLT. An extensive evaluation on a variety of standalone benchmarks and production services indicates that the new method recovers up to $0.8$ of the maximum BOLT benefit, even when most of the input profile data is stale and would have been discarded by the optimizer otherwise.

Stale Profile Matching

TL;DR

This work tackles profile staleness in profile-guided optimization by introducing a practical two‑stage approach—matching and inference—to reuse profiles collected on binaries built several revisions behind the release. Implemented in the BOLT post‑link optimizer, the method uses a hierarchical, multi‑level block hashing scheme to match basic blocks and a minimum‑cost flow framework to infer consistent block and branch counts, preserving as much of the original guidance as possible. Across large open‑source binaries and production Meta workloads, it recovers a substantial portion () of the maximum potential BOLT speedup when profiles are stale, including a clang example achieving of PGO benefits with over stale data. The approach reduces the need for re‑profiling after hotfixes, broadens the practical adoption of PGO, and is general enough to adapt to other PGO systems beyond BOLT.

Abstract

Profile-guided optimizations rely on profile data for directing compilers to generate optimized code. To achieve the maximum performance boost, profile data needs to be collected on the same version of the binary that is being optimized. In practice however, there is typically a gap between the profile collection and the release, which makes a portion of the profile invalid for optimizations. This phenomenon is known as profile staleness, and it is a serious practical problem for data-center workloads both for compilers and binary optimizers. In this paper we thoroughly study the staleness problem and propose the first practical solution for utilizing profiles collected on binaries built from several revisions behind the release. Our algorithm is developed and implemented in a mainstream open-source post-link optimizer, BOLT. An extensive evaluation on a variety of standalone benchmarks and production services indicates that the new method recovers up to of the maximum BOLT benefit, even when most of the input profile data is stale and would have been discarded by the optimizer otherwise.
Paper Structure (16 sections, 5 equations, 10 figures, 1 table)

This paper contains 16 sections, 5 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Continuous profiling causes a mismatch between revisions used to produce the profile ($R_0$) and to which the profile is applied ($R_n$).
  • Figure 2: Investigation of profile staleness for the clang binary (release_15) built in different modes.
  • Figure 3: A function in clang built with AutoFDO in the profiled (left) and the release (right) binaries.
  • Figure 4: A function, foo, modified between two releases (old and new) of the binary. $B_{new}$ and $P_{old}$ comprise the input for the stale profile matching problem. The goal is to infer a profile, which is as close to $P_{new}$ as possible.
  • Figure 5: An overview of our algorithm for the stale profile matching problem.
  • ...and 5 more figures