Dissecting FLOPs along input dimensions for GreenAI cost estimations

Andrea Asperti; Davide Evangelista; Moreno Marzolla

Dissecting FLOPs along input dimensions for GreenAI cost estimations

Andrea Asperti, Davide Evangelista, Moreno Marzolla

TL;DR

The paper tackles the gap between FLOPs and actual energy/time costs on GPUs/TPUs by introducing $\alpha$-FLOPs, a simple, input-dimension-aware correction that accounts for nonuniform parallelism across axes. It formalizes the correction with $\alpha_K(S)=\left(\frac{S_K+\beta_K(S-S_K)}{S}\right)^{\gamma_K}$ where $S=W\times H$, and demonstrates through convolutional and dense layers that this measure aligns more closely with observed execution times than standard FLOPs. Empirical results show pronounced speedups along certain dimensions, especially for larger spatial extents and certain kernel sizes, validating the approach and enabling better hardware-aware efficiency comparisons. The work positions $\alpha$-FLOPs as a practical middle ground between parameter counts and FLOPs, offering a tool for GreenAI cost estimation with open data and clear paths for extending to more architectures.

Abstract

The term GreenAI refers to a novel approach to Deep Learning, that is more aware of the ecological impact and the computational efficiency of its methods. The promoters of GreenAI suggested the use of Floating Point Operations (FLOPs) as a measure of the computational cost of Neural Networks; however, that measure does not correlate well with the energy consumption of hardware equipped with massively parallel processing units like GPUs or TPUs. In this article, we propose a simple refinement of the formula used to compute floating point operations for convolutional layers, called α-FLOPs, explaining and correcting the traditional discrepancy with respect to different layers, and closer to reality. The notion of α-FLOPs relies on the crucial insight that, in case of inputs with multiple dimensions, there is no reason to believe that the speedup offered by parallelism will be uniform along all different axes.

Dissecting FLOPs along input dimensions for GreenAI cost estimations

TL;DR

The paper tackles the gap between FLOPs and actual energy/time costs on GPUs/TPUs by introducing

-FLOPs, a simple, input-dimension-aware correction that accounts for nonuniform parallelism across axes. It formalizes the correction with

where

, and demonstrates through convolutional and dense layers that this measure aligns more closely with observed execution times than standard FLOPs. Empirical results show pronounced speedups along certain dimensions, especially for larger spatial extents and certain kernel sizes, validating the approach and enabling better hardware-aware efficiency comparisons. The work positions

-FLOPs as a practical middle ground between parameter counts and FLOPs, offering a tool for GreenAI cost estimation with open data and clear paths for extending to more architectures.

Dissecting FLOPs along input dimensions for GreenAI cost estimations

TL;DR

Abstract

Dissecting FLOPs along input dimensions for GreenAI cost estimations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)