Table of Contents
Fetching ...

A Performance Verification Methodology for Resource Allocation Heuristics

Saksham Goel, Benjamin Mikek, Jehad Aly, Venkat Arun, Ahmed Saeed, Aditya Akella

TL;DR

This paper tackles the problem of robustly evaluating resource-allocation heuristics under realistic, stress-driven conditions. It introduces Virelay, a general SMT-based framework that models heuristics and systems through a minimal-assumption, assumption-constrained worst-case analysis with a simple state-transition structure. Through six case studies, including Work Stealing and the Linux CFS load balancer, the authors derive actionable bounds, reveal real-world bugs, and replicate prior work to validate the framework’s breadth. The work demonstrates that formal performance verification can uncover both theoretical limits and practical issues, enabling faster, safer, and more reliable design of scheduling systems across domains.

Abstract

Performance verification is a nascent but promising tool for understanding the performance and limitations of heuristics under realistic assumptions. Bespoke performance verification tools have already demonstrated their value in settings like congestion control and packet scheduling. In this paper, we aim to emphasize the broad applicability and utility of performance verification. To that end, we highlight the design principles of performance verification. Then, we leverage that understanding to develop a set of easy-to-follow guidelines that are applicable to a wide range of resource allocation heuristics. In particular, we introduce Virelay, a framework that enables heuristic designers to express the behavior of their algorithms and their assumptions about the system in an environment that resembles a discrete-event simulator. We demonstrate the utility and ease-of-use of Virelay by applying it to six diverse case studies. We produce bounds on the performance of classical algorithms, work stealing and SRPT scheduling, under practical assumptions. We demonstrate Virelay's expressiveness by capturing existing models for congestion control and packet scheduling, and we verify the observation that TCP unfairness can cause some ML training workloads to spontaneously converge to a state of high network utilization. Finally, we use Virelay to identify two bugs in the Linux CFS load balancer.

A Performance Verification Methodology for Resource Allocation Heuristics

TL;DR

This paper tackles the problem of robustly evaluating resource-allocation heuristics under realistic, stress-driven conditions. It introduces Virelay, a general SMT-based framework that models heuristics and systems through a minimal-assumption, assumption-constrained worst-case analysis with a simple state-transition structure. Through six case studies, including Work Stealing and the Linux CFS load balancer, the authors derive actionable bounds, reveal real-world bugs, and replicate prior work to validate the framework’s breadth. The work demonstrates that formal performance verification can uncover both theoretical limits and practical issues, enabling faster, safer, and more reliable design of scheduling systems across domains.

Abstract

Performance verification is a nascent but promising tool for understanding the performance and limitations of heuristics under realistic assumptions. Bespoke performance verification tools have already demonstrated their value in settings like congestion control and packet scheduling. In this paper, we aim to emphasize the broad applicability and utility of performance verification. To that end, we highlight the design principles of performance verification. Then, we leverage that understanding to develop a set of easy-to-follow guidelines that are applicable to a wide range of resource allocation heuristics. In particular, we introduce Virelay, a framework that enables heuristic designers to express the behavior of their algorithms and their assumptions about the system in an environment that resembles a discrete-event simulator. We demonstrate the utility and ease-of-use of Virelay by applying it to six diverse case studies. We produce bounds on the performance of classical algorithms, work stealing and SRPT scheduling, under practical assumptions. We demonstrate Virelay's expressiveness by capturing existing models for congestion control and packet scheduling, and we verify the observation that TCP unfairness can cause some ML training workloads to spontaneously converge to a state of high network utilization. Finally, we use Virelay to identify two bugs in the Linux CFS load balancer.
Paper Structure (37 sections, 4 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 37 sections, 4 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Overview of alternating calls between System and Algorithm.
  • Figure 2: Performance verification workflow under Virelay. Although formal methods experts can jump right into step 2.b with little help from the guide or even existing definitions, we find that this framework helps formal methods novices develop useful models rapidly.
  • Figure 3: An overview of Virelay. A user only needs to fill in the red parts of the framework. More details about the exact structure of the framework are given in listings 1--4.
  • Figure 4: Performance comparisons of work stealing heuristic to optimal
  • Figure 5: Work Stealing vs Optimal
  • ...and 7 more figures