SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Mariam Rakka; Jinhao Li; Guohao Dai; Ahmed Eltawil; Mohammed E. Fouda; Fadi Kurdahi

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Mariam Rakka, Jinhao Li, Guohao Dai, Ahmed Eltawil, Mohammed E. Fouda, Fadi Kurdahi

TL;DR

This work proposes SoftmAP, a software-hardware co-design methodology that implements an integer-only low-precision Softmax using In-Memory Compute (IMC) hardware, making LLMs more deployable without compromising performance.

Abstract

Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators like Softmax and Layernorm remain bottlenecks due to their sensitivity to quantization. We propose SoftmAP, a software-hardware co-design methodology that implements an integer-only low-precision Softmax using In-Memory Compute (IMC) hardware. Our method achieves up to three orders of magnitude improvement in the energy-delay product compared to A100 and RTX3090 GPUs, making LLMs more deployable without compromising performance.

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

TL;DR

Abstract

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)