Table of Contents
Fetching ...

Dissecting FLOPs along input dimensions for GreenAI cost estimations

Andrea Asperti, Davide Evangelista, Moreno Marzolla

TL;DR

The paper tackles the gap between FLOPs and actual energy/time costs on GPUs/TPUs by introducing $\alpha$-FLOPs, a simple, input-dimension-aware correction that accounts for nonuniform parallelism across axes. It formalizes the correction with $\alpha_K(S)=\left(\frac{S_K+\beta_K(S-S_K)}{S}\right)^{\gamma_K}$ where $S=W\times H$, and demonstrates through convolutional and dense layers that this measure aligns more closely with observed execution times than standard FLOPs. Empirical results show pronounced speedups along certain dimensions, especially for larger spatial extents and certain kernel sizes, validating the approach and enabling better hardware-aware efficiency comparisons. The work positions $\alpha$-FLOPs as a practical middle ground between parameter counts and FLOPs, offering a tool for GreenAI cost estimation with open data and clear paths for extending to more architectures.

Abstract

The term GreenAI refers to a novel approach to Deep Learning, that is more aware of the ecological impact and the computational efficiency of its methods. The promoters of GreenAI suggested the use of Floating Point Operations (FLOPs) as a measure of the computational cost of Neural Networks; however, that measure does not correlate well with the energy consumption of hardware equipped with massively parallel processing units like GPUs or TPUs. In this article, we propose a simple refinement of the formula used to compute floating point operations for convolutional layers, called α-FLOPs, explaining and correcting the traditional discrepancy with respect to different layers, and closer to reality. The notion of α-FLOPs relies on the crucial insight that, in case of inputs with multiple dimensions, there is no reason to believe that the speedup offered by parallelism will be uniform along all different axes.

Dissecting FLOPs along input dimensions for GreenAI cost estimations

TL;DR

The paper tackles the gap between FLOPs and actual energy/time costs on GPUs/TPUs by introducing -FLOPs, a simple, input-dimension-aware correction that accounts for nonuniform parallelism across axes. It formalizes the correction with where , and demonstrates through convolutional and dense layers that this measure aligns more closely with observed execution times than standard FLOPs. Empirical results show pronounced speedups along certain dimensions, especially for larger spatial extents and certain kernel sizes, validating the approach and enabling better hardware-aware efficiency comparisons. The work positions -FLOPs as a practical middle ground between parameter counts and FLOPs, offering a tool for GreenAI cost estimation with open data and clear paths for extending to more architectures.

Abstract

The term GreenAI refers to a novel approach to Deep Learning, that is more aware of the ecological impact and the computational efficiency of its methods. The promoters of GreenAI suggested the use of Floating Point Operations (FLOPs) as a measure of the computational cost of Neural Networks; however, that measure does not correlate well with the energy consumption of hardware equipped with massively parallel processing units like GPUs or TPUs. In this article, we propose a simple refinement of the formula used to compute floating point operations for convolutional layers, called α-FLOPs, explaining and correcting the traditional discrepancy with respect to different layers, and closer to reality. The notion of α-FLOPs relies on the crucial insight that, in case of inputs with multiple dimensions, there is no reason to believe that the speedup offered by parallelism will be uniform along all different axes.

Paper Structure

This paper contains 15 sections, 8 equations, 6 figures.

Figures (6)

  • Figure 1: Comparison of execution times for Dense and Convolutional layers with the same amount of FLOPs. In Table (a) we provide numerical values for layers with 327.68 Million FLOPs; in the right we show the execution time of similar configurations for increasing dimensions. All layers for a given value of $2^x$ (i.e. along any vertical section) have the same amount of FLOPs.
  • Figure 2: Execution time vs different input dimensions, keeping the number of FLOPs constant. In plot (a) we increase $K$ and proportionally decrease $C_\textit{in}$ and $C_\textit{out}$. In plot (b) we increase $K$ and proportionally decrease $W$ and $H$. We would expect constant lines, but this is not the case. In plots (c) and (d) we repeat the experiment on a (single core) CPU, instead of a GPU.
  • Figure 3: Predicted execution time by means of $\alpha$-FLOPs for the same convolutional configurations of Figure \ref{['fig:dense_vs_conv']}; in (b) predictions are depicted as dashed lines.
  • Figure 4: Predicted execution time by means of $\alpha$-FLOPS, depicted as dashed lines, for the same convolutional configurations of Figure \ref{['fig:conv_bad']}
  • Figure 5: Execution time and predictions by means of $\alpha$-FLOPs (dashed lines)
  • ...and 1 more figures