TransAxx: Efficient Transformers with Approximate Computing

Dimitrios Danopoulos; Georgios Zervakis; Dimitrios Soudris; Jörg Henkel

TransAxx: Efficient Transformers with Approximate Computing

Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel

TL;DR

TransAxx tackles the high computational cost of Vision Transformers by enabling fast, hardware-aware emulation of approximate multipliers within ViT models. The framework, built as a PyTorch plugin with LUT-based and functional multiplier support, allows per-layer approximation, approximate-aware retraining, and mixed-approximation configurations, all accelerated on GPUs. To navigate the huge design space, the authors introduce a hardware-driven Monte Carlo Tree Search that leverages a surrogate accuracy predictor to identify Pareto-optimal accuracy–power configurations across ViT architectures. Empirical results on ImageNet across multiple ViT models show meaningful accuracy recovery after retraining and substantial power/area savings, illustrating TransAxx’s potential as a software-hardware co-design tool for ViTs on resource-constrained devices. The work lays the groundwork for systematic exploration of approximate ViT designs and will be released as open-source to enable broader adoption and extension.

Abstract

Vision Transformer (ViT) models which were recently introduced by the transformer architecture have shown to be very competitive and often become a popular alternative to Convolutional Neural Networks (CNNs). However, the high computational requirements of these models limit their practical applicability especially on low-power devices. Current state-of-the-art employs approximate multipliers to address the highly increased compute demands of DNN accelerators but no prior research has explored their use on ViT models. In this work we propose TransAxx, a framework based on the popular PyTorch library that enables fast inherent support for approximate arithmetic to seamlessly evaluate the impact of approximate computing on DNNs such as ViT models. Using TransAxx we analyze the sensitivity of transformer models on the ImageNet dataset to approximate multiplications and perform approximate-aware finetuning to regain accuracy. Furthermore, we propose a methodology to generate approximate accelerators for ViT models. Our approach uses a Monte Carlo Tree Search (MCTS) algorithm to efficiently search the space of possible configurations using a hardware-driven hand-crafted policy. Our evaluation demonstrates the efficacy of our methodology in achieving significant trade-offs between accuracy and power, resulting in substantial gains without compromising on performance.

TransAxx: Efficient Transformers with Approximate Computing

TL;DR

Abstract

Paper Structure (19 sections, 6 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 6 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Quantized ViT models
Approximate Multipliers
Approximate CNN Circuit Design
Fast Emulation of Approximate ViT Models
Designing the framework
Support for the transformer architecture
Quantization and fine-tuning strategies
Toy experiment
Searching the Space of Approximate Designs
Rationale for employing Monte Carlo Tree Search
The proposed algorithm
Experimental Results
TransAxx performance on ViT models
...and 4 more sections

Figures (8)

Figure 1: Abstract overview of TransAxx framework operation. The simulation flow starts from the user input on the left that comprises i) the ViT model in Pytorch and the respective train/test datasets, ii) the approximate multipliers, and iii) several other user-defined parameters required by TransAxx (quantization, calibration, etc). Then, TransAxx generates the LUTs that model the approximate multipliers and through the required transformations simulates the model's behavior under various conditions, such as different approximate multipliers per layer. Also, TransAxx facilitates fine-tuning through approximate-aware retraining. Beneath the framework's simulation flow, a GPU is utilized to accelerate the process.
Figure 2: Preliminary testing with an approximate attention layer. Left: MSE loss per training iteration. Right: Histograms of target data (using FP32) and output data (using approx. multiplier) distributions from the layer's inference.
Figure 3: Comparison of actual (red) and predicted (blue) accuracy after applying approximation to each layer individually (from layer 1 to 5) across different ViT models.
Figure 4: LUT-based multiplication performance. Left: LUT bitwidth impact on inference emulation time. Right: Caching effect on LUT performance during the first batches of inference emulation.
Figure 6: Convergence of the MCTS rewards.
...and 3 more figures

TransAxx: Efficient Transformers with Approximate Computing

TL;DR

Abstract

TransAxx: Efficient Transformers with Approximate Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)