Dissecting Transformers: A CLEAR Perspective towards Green AI

Hemang Jain; Shailender Goyal; Divyansh Pandey; Karthik Vaidhyanathan

Dissecting Transformers: A CLEAR Perspective towards Green AI

Hemang Jain, Shailender Goyal, Divyansh Pandey, Karthik Vaidhyanathan

TL;DR

Large language models incur substantial energy costs during inference, yet prior work provides only coarse, model-level metrics. CLEAR introduces a component-level energy assessment method that caches per-component activations and uses an amplification strategy to overcome coarse sensor granularity, enabling precise energy attribution across Transformer blocks. The study finds that attention blocks exhibit the highest energy per FLOP and that total energy follows a two-term model $E(L) \approx E_0 + k \cdot \mathrm{FLOPs}(L)$ with component-dependent $k$, while %Capture remains above 90% across architectures. These insights enable targeted architectural and hardware–software co-design for Green AI, moving energy considerations from afterthoughts to integral design criteria.

Abstract

The rapid adoption of Large Language Models (LLMs) has raised significant environmental concerns. Unlike the one-time cost of training, LLM inference occurs continuously at a global scale and now dominates the AI energy footprint. Yet, most sustainability studies report only coarse, model-level metrics due to the lack of fine-grained measurement methods, treating energy efficiency more as an afterthought than as a primary objective. We present the first fine-grained empirical analysis of inference energy across core components of transformer architecture. We propose a novel methodology, Component-Level Energy Assessment via Repeated sampling (CLEAR), to overcome temporal mismatch between microsecond scale component execution and monitoring of millisecond (ms) scale energy sensors. Using CLEAR, we evaluate 15 models spanning four distinct architecture types and consistently keep component-wise energy variance below 9.5\% while capturing more than 90\% of the model's total energy as individual components. Our empirical analysis reveals that Attention blocks consume significantly more energy per floating-point operation (FLOP), indicating that energy consumption is not proportionally aligned with FLOP counts. This shows that FLOPs alone fail to capture the true energy cost at a component level. Our findings establish detailed component-level energy baselines and provide insight as an initial step to build energy-efficient transformer models through component-level optimizations.

Dissecting Transformers: A CLEAR Perspective towards Green AI

TL;DR

Abstract

Dissecting Transformers: A CLEAR Perspective towards Green AI

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)