A Compact, Low Power Transprecision ALU for Smart Edge Devices
Ayushi Dube, Gian Singh, Sarma Vrudhula
TL;DR
This paper tackles the challenge of energy-efficient edge ML by introducing TALU, a compact transprecision arithmetic unit that supports Posit, FP, and INT across 4–32 bits with runtime reconfigurability. A novel Posit decode algorithm based on threshold logic and Q-functions removes the need for dedicated decoders, enabling low-area and low-power operation. TALU is extended with TALU-V, a vector unit integrated with a lightweight RISC-V core, to efficiently execute ML kernels such as matrix multiplications at the edge. Experimental results show substantial reductions in power and area compared to state-of-the-art Posit/FP MAC units and competitive energy efficiency and throughput relative to a UMAC-based vector processor, highlighting strong potential for ultra-low-power edge inference.
Abstract
Transprecision computing (TC) is a promising approach for energy-efficient machine learning (ML) computation on resource-constrained platforms. This work presents a novel ASIC design of a Transprecision Arithmetic and Logic Unit (TALU) that can support multiple number formats: Posit, Floating Point (FP), and Integer (INT) data with variable bitwidth of 8, 16, and 32 bits. Additionally, TALU can be reconfigured in runtime to support TC without overprovisioning the hardware. Posit is a new number format, gaining traction for ML computations, producing similar accuracy in lower bitwidth than FP representation. This paper thus proposes a novel algorithm for decoding Posit for energy-efficient computation. TALU implementation achieves a 54.6x reduction in power consumption and 19.8x reduction in the area as compared to a state-of-the-art unified MAC unit (UMAC for Posit and FP computation. Experimental results on an ML compute kernel executed on a Vector Processor of TALUs integrated with a RISC-V processor achieves about 2x improvement in energy efficiency and similar throughput as compared to a state-of-the-art TC-based vector processor.
