Table of Contents
Fetching ...

Enhancing Regression Models for Complex Systems Using Evolutionary Techniques for Feature Engineering

Patricia Arroba, José L. Risco-Martín, Marina Zapater, José M. Moya, José L. Ayala

TL;DR

The paper addresses the challenge of building accurate power consumption models for complex, heterogeneous data-center systems. It introduces a hybrid GE+lasso framework that uses Grammatical Evolution to generate expressive feature combinations (Symbolic Regression) and Lasso to estimate linear coefficients, producing fast, linear, convex, and differentiable power models. In a Cloud server case study, the approach achieves an average testing error of about 3.98% across diverse workloads, and the results demonstrate robustness and suitability for runtime energy optimization. The method’s automatic feature engineering and interpretable coefficients offer a practical pathway to energy-efficient Cloud infrastructure and can be extended to other computing environments with similar characteristics.

Abstract

This work proposes an automatic methodology for modeling complex systems. Our methodology is based on the combination of Grammatical Evolution and classical regression to obtain an optimal set of features that take part of a linear and convex model. This technique provides both Feature Engineering and Symbolic Regression in order to infer accurate models with no effort or designer's expertise requirements. As advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. These facilities consume from 10 to 100 times more power per square foot than typical office buildings. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. For this case study, our methodology minimizes error in power prediction. This work has been tested using real Cloud applications resulting on an average error in power estimation of 3.98%. Our work improves the possibilities of deriving Cloud energy efficient policies in Cloud data centers being applicable to other computing environments with similar characteristics.

Enhancing Regression Models for Complex Systems Using Evolutionary Techniques for Feature Engineering

TL;DR

The paper addresses the challenge of building accurate power consumption models for complex, heterogeneous data-center systems. It introduces a hybrid GE+lasso framework that uses Grammatical Evolution to generate expressive feature combinations (Symbolic Regression) and Lasso to estimate linear coefficients, producing fast, linear, convex, and differentiable power models. In a Cloud server case study, the approach achieves an average testing error of about 3.98% across diverse workloads, and the results demonstrate robustness and suitability for runtime energy optimization. The method’s automatic feature engineering and interpretable coefficients offer a practical pathway to energy-efficient Cloud infrastructure and can be extended to other computing environments with similar characteristics.

Abstract

This work proposes an automatic methodology for modeling complex systems. Our methodology is based on the combination of Grammatical Evolution and classical regression to obtain an optimal set of features that take part of a linear and convex model. This technique provides both Feature Engineering and Symbolic Regression in order to infer accurate models with no effort or designer's expertise requirements. As advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. These facilities consume from 10 to 100 times more power per square foot than typical office buildings. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. For this case study, our methodology minimizes error in power prediction. This work has been tested using real Cloud applications resulting on an average error in power estimation of 3.98%. Our work improves the possibilities of deriving Cloud energy efficient policies in Cloud data centers being applicable to other computing environments with similar characteristics.
Paper Structure (25 sections, 6 equations, 4 figures, 2 tables)

This paper contains 25 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Example of a grammar in BNF format designed for symbolic regression
  • Figure 2: Optimized modeling using GE+lasso methodology.
  • Figure 3: Grammar in BNF format. $x$ variables, with $i=0 \ldots n$, represent each parameter obtained from the system.
  • Figure 4: Grammar in BNF format. $x$ variables, with $i=0 \ldots 6$, represent processor and memory temperatures, fan speed, processor and memory utilization percentages, processor frequency and voltage, respectively.