Table of Contents
Fetching ...

Worst-Case Convergence Time of ML Algorithms via Extreme Value Theory

Saeid Tizpaz-Niari, Sriram Sankaranarayanan

TL;DR

This work tackles the challenge of bounding worst-case convergence times (WCCT) for ML training and inference, a critical timing property often inaccessible to static analysis. It advances a practical framework based on Extreme Value Theory (EVT), employing the Generalized Extreme Value (GEV) distribution and Generalized Pareto Distribution (GPD) with threshold exceedances to model the tail of convergence times and derive return levels and return periods for probabilistic guarantees. Empirical results across linear ML training benchmarks and deep neural network inference in cyber-physical systems show that EVT-based WCCT predictions can outperform baseline Bayesian-factor methods, especially for long horizons, demonstrating scalability and usefulness for timing analysis. The findings support EVT as a viable tool to quantify tail risks in ML systems, with implications for reliability, availability, and environmental impact, while also outlining practical caveats about data representativeness and threshold selection.

Abstract

This paper leverages the statistics of extreme values to predict the worst-case convergence times of machine learning algorithms. Timing is a critical non-functional property of ML systems, and providing the worst-case converge times is essential to guarantee the availability of ML and its services. However, timing properties such as worst-case convergence times (WCCT) are difficult to verify since (1) they are not encoded in the syntax or semantics of underlying programming languages of AI, (2) their evaluations depend on both algorithmic implementations and underlying systems, and (3) their measurements involve uncertainty and noise. Therefore, prevalent formal methods and statistical models fail to provide rich information on the amounts and likelihood of WCCT. Our key observation is that the timing information we seek represents the extreme tail of execution times. Therefore, extreme value theory (EVT), a statistical discipline that focuses on understanding and predicting the distribution of extreme values in the tail of outcomes, provides an ideal framework to model and analyze WCCT in the training and inference phases of ML paradigm. Building upon the mathematical tools from EVT, we propose a practical framework to predict the worst-case timing properties of ML. Over a set of linear ML training algorithms, we show that EVT achieves a better accuracy for predicting WCCTs than relevant statistical methods such as the Bayesian factor. On the set of larger machine learning training algorithms and deep neural network inference, we show the feasibility and usefulness of EVT models to accurately predict WCCTs, their expected return periods, and their likelihood.

Worst-Case Convergence Time of ML Algorithms via Extreme Value Theory

TL;DR

This work tackles the challenge of bounding worst-case convergence times (WCCT) for ML training and inference, a critical timing property often inaccessible to static analysis. It advances a practical framework based on Extreme Value Theory (EVT), employing the Generalized Extreme Value (GEV) distribution and Generalized Pareto Distribution (GPD) with threshold exceedances to model the tail of convergence times and derive return levels and return periods for probabilistic guarantees. Empirical results across linear ML training benchmarks and deep neural network inference in cyber-physical systems show that EVT-based WCCT predictions can outperform baseline Bayesian-factor methods, especially for long horizons, demonstrating scalability and usefulness for timing analysis. The findings support EVT as a viable tool to quantify tail risks in ML systems, with implications for reliability, availability, and environmental impact, while also outlining practical caveats about data representativeness and threshold selection.

Abstract

This paper leverages the statistics of extreme values to predict the worst-case convergence times of machine learning algorithms. Timing is a critical non-functional property of ML systems, and providing the worst-case converge times is essential to guarantee the availability of ML and its services. However, timing properties such as worst-case convergence times (WCCT) are difficult to verify since (1) they are not encoded in the syntax or semantics of underlying programming languages of AI, (2) their evaluations depend on both algorithmic implementations and underlying systems, and (3) their measurements involve uncertainty and noise. Therefore, prevalent formal methods and statistical models fail to provide rich information on the amounts and likelihood of WCCT. Our key observation is that the timing information we seek represents the extreme tail of execution times. Therefore, extreme value theory (EVT), a statistical discipline that focuses on understanding and predicting the distribution of extreme values in the tail of outcomes, provides an ideal framework to model and analyze WCCT in the training and inference phases of ML paradigm. Building upon the mathematical tools from EVT, we propose a practical framework to predict the worst-case timing properties of ML. Over a set of linear ML training algorithms, we show that EVT achieves a better accuracy for predicting WCCTs than relevant statistical methods such as the Bayesian factor. On the set of larger machine learning training algorithms and deep neural network inference, we show the feasibility and usefulness of EVT models to accurately predict WCCTs, their expected return periods, and their likelihood.
Paper Structure (9 sections, 5 equations, 8 figures, 1 table)

This paper contains 9 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview Example. (a) The computation times of applying photo filters with a threshold of 15.6 seconds. (b) the density plot for GEV distribution (blue) vs. empirical model (black), (c) quantile Plot for the execution time of the photo algorithm, (d) m-return level plot for the computation times of the algorithm with expected values and their 95% CI.
  • Figure 2: Logistic Regression. (a) training convergence times for logistic regression varying input dataset and hyperparameters where the red line shows 0.997-quantile (99.7% of data is below the red line), (b) the density plot for GEV of Logistic Regression, (c) quantile plot for execution time of Logistic Regression, (d) m-return level plot for Logistic with expected values and their 95% CI.
  • Figure 3: Decision Tree. (a) the computation times of training Decision Tree varying input dataset and hyperparameters with a threshold sets to 48.7 (s), (b) the density plot for GEV of Decision Tree, (c) quantile Plot for execution time of Decision Tree, (d) m-return level plot for Decision Tree with expected values and their 95% CI.
  • Figure 4: Linear Discriminant Analysis. (a) the computation times of training Linear Discriminant Analysis varying input dataset and hyperparameters with a threshold set to 2.8 (s), (b) the density plot for GEV of Linear Discriminant, (c) quantile Plot for the execution time of Linear Discriminant, (d) m-return level plot for Linear Discriminant with expected values and their 95% CI.
  • Figure 5: Schematic diagrams for the neural-network controlled physical systems.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 4.1: Problem Statement