A robust methodology for long-term sustainability evaluation of Machine Learning models
Jorge Paz-Ruza, João Gama, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas
TL;DR
This work addresses the inadequacy of short-term, batch-centric sustainability assessments for ML systems by proposing a model-agnostic, long-term evaluation protocol that accommodates both batch and streaming learning. It assesses performance, sustainability, and data availability along a lifecycle trajectory with sequential data and prequential evaluation. Empirical results across multiple datasets reveal that long-term environmental cost can be large with marginal performance gains, and streaming approaches can rival batch methods on simpler tasks. The proposed protocol offers a practical framework for regulators and practitioners to evaluate ML sustainability in real-world, evolving usage scenarios, with broad implications for deploying energy-efficient AI.
Abstract
Sustainability and efficiency have become essential considerations in the development and deployment of Artificial Intelligence systems, yet existing regulatory and reporting practices lack standardized, model-agnostic evaluation protocols. Current assessments often measure only short-term experimental resource usage and disproportionately emphasize batch learning settings, failing to reflect real-world, long-term AI lifecycles. In this work, we propose a comprehensive evaluation protocol for assessing the long-term sustainability of ML models, applicable to both batch and streaming learning scenarios. Through experiments on diverse classification tasks using a range of model types, we demonstrate that traditional static train-test evaluations do not reliably capture sustainability under evolving data and repeated model updates. Our results show that long-term sustainability varies significantly across models, and in many cases, higher environmental cost yields little performance benefit.
