Table of Contents
Fetching ...

Automated Computational Energy Minimization of ML Algorithms using Constrained Bayesian Optimization

Pallavi Mitra, Felix Biessmann

TL;DR

The paper addresses the rising energy costs of training large hyperparameter configurations by proposing energy-aware hyperparameter optimization. It introduces constrained Bayesian optimization that minimizes $tau(x)$ subject to a performance constraint on $c(x)$ (classification) or $cr(x)$ (regression), using log-scale surrogates modeled by Gaussian Processes with a Matern 5/2 kernel. The approach employs a joint acquisition combining Expected Improvement (EI) for the objective and Probability of Feasibility (PoF) for the constraint, and it compares against unconstrained BO with a quadratic penalty. Across regression and classification tasks, CBO reduces wallclock runtime while maintaining the required performance threshold, demonstrating practical utility for energy-efficient training and suggesting avenues for modeling energy-performance interactions in future work.

Abstract

Bayesian optimization (BO) is an efficient framework for optimization of black-box objectives when function evaluations are costly and gradient information is not easily accessible. BO has been successfully applied to automate the task of hyperparameter optimization (HPO) in machine learning (ML) models with the primary objective of optimizing predictive performance on held-out data. In recent years, however, with ever-growing model sizes, the energy cost associated with model training has become an important factor for ML applications. Here we evaluate Constrained Bayesian Optimization (CBO) with the primary objective of minimizing energy consumption and subject to the constraint that the generalization performance is above some threshold. We evaluate our approach on regression and classification tasks and demonstrate that CBO achieves lower energy consumption without compromising the predictive performance of ML models.

Automated Computational Energy Minimization of ML Algorithms using Constrained Bayesian Optimization

TL;DR

The paper addresses the rising energy costs of training large hyperparameter configurations by proposing energy-aware hyperparameter optimization. It introduces constrained Bayesian optimization that minimizes subject to a performance constraint on (classification) or (regression), using log-scale surrogates modeled by Gaussian Processes with a Matern 5/2 kernel. The approach employs a joint acquisition combining Expected Improvement (EI) for the objective and Probability of Feasibility (PoF) for the constraint, and it compares against unconstrained BO with a quadratic penalty. Across regression and classification tasks, CBO reduces wallclock runtime while maintaining the required performance threshold, demonstrating practical utility for energy-efficient training and suggesting avenues for modeling energy-performance interactions in future work.

Abstract

Bayesian optimization (BO) is an efficient framework for optimization of black-box objectives when function evaluations are costly and gradient information is not easily accessible. BO has been successfully applied to automate the task of hyperparameter optimization (HPO) in machine learning (ML) models with the primary objective of optimizing predictive performance on held-out data. In recent years, however, with ever-growing model sizes, the energy cost associated with model training has become an important factor for ML applications. Here we evaluate Constrained Bayesian Optimization (CBO) with the primary objective of minimizing energy consumption and subject to the constraint that the generalization performance is above some threshold. We evaluate our approach on regression and classification tasks and demonstrate that CBO achieves lower energy consumption without compromising the predictive performance of ML models.
Paper Structure (10 sections, 4 equations, 9 figures, 3 tables)

This paper contains 10 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Performance comparison of (a) Unconstrained BO and (b) CBO, for Lasso regressor. Blue bars (left y-axis) indicate the mse achieved with the current best hyperparameter set. Red lines (right y-axis) indicate the cumulative runtime. Black dashed lines denote the pre-defined mse threshold. CBO meets the mse threshold more often with lower cumulative runtimes, than the Unconstrained BO.
  • Figure 2: Performance comparison of (a) Unconstrained BO and (b) CBO, for Ridge Classifier. Blue bars (left y-axis) indicate the accuracy.
  • Figure 3: Performance Comparison of CBO with Unconstrained BO for Elastic Net Regression Model
  • Figure 4: Performance Comparison of CBO with Unconstrained BO for K Nearest Neighbour Regression Model
  • Figure 5: Performance Comparison of CBO with Unconstrained BO for Decision Tree Regression Model
  • ...and 4 more figures