Table of Contents
Fetching ...

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification

Arshia Kermani, Ehsan Zeraatkar, Habib Irani

TL;DR

This work addresses the high energy cost of transformer inference in time series classification by systematically evaluating pruning and quantization strategies. It deploys a Vision Transformer–based Time Series model on three datasets (RefrigerationDevices, ElectricDevices, PLAID) and analyzes two configurations (T1 and T2) to reveal capacity–efficiency trade-offs. The results show static quantization saves about 29.14% energy with minimal accuracy loss, while L1 pruning provides up to 1.63× faster inference and 37.08% energy savings; 8-bit quantization maintains accuracy with only ~1.4–1.8% degradation. The study demonstrates that a hybrid of pruning and quantization can achieve up to 45.7% overall energy reduction with limited accuracy loss, offering actionable guidance for edge and resource-constrained deployments of transformer-based time series classifiers.

Abstract

The increasing computational demands of transformer models in time series classification necessitate effective optimization strategies for energy-efficient deployment. Our study presents a systematic investigation of optimization techniques, focusing on structured pruning and quantization methods for transformer architectures. Through extensive experimentation on three distinct datasets (RefrigerationDevices, ElectricDevices, and PLAID), we quantitatively evaluate model performance and energy efficiency across different transformer configurations. Our experimental results demonstrate that static quantization reduces energy consumption by 29.14% while maintaining classification performance, and L1 pruning achieves a 63% improvement in inference speed with minimal accuracy degradation. Our findings provide valuable insights into the effectiveness of optimization strategies for transformer-based time series classification, establishing a foundation for efficient model deployment in resource-constrained environments.

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification

TL;DR

This work addresses the high energy cost of transformer inference in time series classification by systematically evaluating pruning and quantization strategies. It deploys a Vision Transformer–based Time Series model on three datasets (RefrigerationDevices, ElectricDevices, PLAID) and analyzes two configurations (T1 and T2) to reveal capacity–efficiency trade-offs. The results show static quantization saves about 29.14% energy with minimal accuracy loss, while L1 pruning provides up to 1.63× faster inference and 37.08% energy savings; 8-bit quantization maintains accuracy with only ~1.4–1.8% degradation. The study demonstrates that a hybrid of pruning and quantization can achieve up to 45.7% overall energy reduction with limited accuracy loss, offering actionable guidance for edge and resource-constrained deployments of transformer-based time series classifiers.

Abstract

The increasing computational demands of transformer models in time series classification necessitate effective optimization strategies for energy-efficient deployment. Our study presents a systematic investigation of optimization techniques, focusing on structured pruning and quantization methods for transformer architectures. Through extensive experimentation on three distinct datasets (RefrigerationDevices, ElectricDevices, and PLAID), we quantitatively evaluate model performance and energy efficiency across different transformer configurations. Our experimental results demonstrate that static quantization reduces energy consumption by 29.14% while maintaining classification performance, and L1 pruning achieves a 63% improvement in inference speed with minimal accuracy degradation. Our findings provide valuable insights into the effectiveness of optimization strategies for transformer-based time series classification, establishing a foundation for efficient model deployment in resource-constrained environments.

Paper Structure

This paper contains 31 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Visualization of Pruning in Transformer Models: The left side represents the original model, while the right side shows a pruned model where an attention head and unnecessary neurons are removed to improve efficiency.
  • Figure 2: Visualization of Quantization in Transformer Models. The left side represents the original FP32 model, while the right side shows the quantized INT8 model, reducing computational cost and memory usage.
  • Figure 3: Overall architecture
  • Figure 4: Energy Savings across Optimization Methods
  • Figure 5: Energy Consumption vs. Inference Time for Different Optimization Methods across Datasets.