Table of Contents
Fetching ...

Training Green AI Models Using Elite Samples

Mohammed Alswaitti, Roberto Verdecchia, Grégoire Danoy, Pascal Bouvry, Johnatan Pecero

TL;DR

The paper addresses the environmental impact of AI training by proposing a data-centric, elite-sample approach to reduce training data without sacrificing performance. It introduces a Differential Evolution-based sampling framework that identifies small, elite subsets of training data tailored to specific dataset-model pairs and evaluates energy usage alongside accuracy. Across 8 classifiers and 25 datasets, using 10% elite data yields up to 50% performance gains and up to 98% energy reductions compared with standard practice, with good generalisability when tested on larger unseen data. This work advances Green AI by integrating energy-aware considerations into instance selection and provides a foundation for future MOEA-based, data-centric, energy-efficient AI training and reproducibility through shared elite-sample repositories.

Abstract

The substantial increase in AI model training has considerable environmental implications, mandating more energy-efficient and sustainable AI practices. On the one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models' performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice.

Training Green AI Models Using Elite Samples

TL;DR

The paper addresses the environmental impact of AI training by proposing a data-centric, elite-sample approach to reduce training data without sacrificing performance. It introduces a Differential Evolution-based sampling framework that identifies small, elite subsets of training data tailored to specific dataset-model pairs and evaluates energy usage alongside accuracy. Across 8 classifiers and 25 datasets, using 10% elite data yields up to 50% performance gains and up to 98% energy reductions compared with standard practice, with good generalisability when tested on larger unseen data. This work advances Green AI by integrating energy-aware considerations into instance selection and provides a foundation for future MOEA-based, data-centric, energy-efficient AI training and reproducibility through shared elite-sample repositories.

Abstract

The substantial increase in AI model training has considerable environmental implications, mandating more energy-efficient and sustainable AI practices. On the one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models' performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice.
Paper Structure (15 sections, 2 equations, 4 figures, 5 tables)

This paper contains 15 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the proposed methodology.
  • Figure 2: Detailed workflow of the proposed DE sampling framework.
  • Figure 3: Overall Average Classification Accuracy and F1- Score of Classifiers over all datasets
  • Figure 4: Overall Average Training Energy Consumption of Classifiers over all datasets.