Table of Contents
Fetching ...

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

Georg Rutishauser, Joan Mihali, Moritz Scherer, Luca Benini

TL;DR

This work introduces xTern, a compact RISC-V ISA extension that enables energy-efficient inference of ternary neural networks on edge devices by exposing a 20-wide dot-product unit, min/max, and a stateful threshold-and-compress backend for ternary data $\,{\mathcal{T}}=\{-1,0,1\}$. The authors integrate xTern into an open RI5CY-NN core and build an eight-core cluster, accompanied by GCC-based tooling, optimized kernels, and an ONNX-to-C deployment pipeline via DORY. Hardware results show a 67% throughput boost for ternary kernels with only a 5.2% power increase and a 57% improvement in energy efficiency, while silicon area overhead remains under 1% at the cluster level. End-to-end evaluations on CIFAR-10 and DVS gesture recognition demonstrate that xTern can deliver higher accuracy at equal latency and substantial energy reductions, highlighting its practical viability for ultra-low-power edge AI platforms.

Abstract

Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

TL;DR

This work introduces xTern, a compact RISC-V ISA extension that enables energy-efficient inference of ternary neural networks on edge devices by exposing a 20-wide dot-product unit, min/max, and a stateful threshold-and-compress backend for ternary data . The authors integrate xTern into an open RI5CY-NN core and build an eight-core cluster, accompanied by GCC-based tooling, optimized kernels, and an ONNX-to-C deployment pipeline via DORY. Hardware results show a 67% throughput boost for ternary kernels with only a 5.2% power increase and a 57% improvement in energy efficiency, while silicon area overhead remains under 1% at the cluster level. End-to-end evaluations on CIFAR-10 and DVS gesture recognition demonstrate that xTern can deliver higher accuracy at equal latency and substantial energy reductions, highlighting its practical viability for ultra-low-power edge AI platforms.

Abstract

Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.
Paper Structure (15 sections, 6 figures, 4 tables)

This paper contains 15 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of accuracy vs normalized load $L_{norm}$ of ResNet8 and ResNet20, quantized to different precisions, on CIFAR10. Networks were trained with the tqt algorithm ref:tqt, with the first and last layers quantized to 8-bit precision. are highlighted in red.
  • Figure 2: Schematic of the hardware for the threshold-compress (thrc, shown in \ref{['subfig:thrc']}) and compressed (smlsdotsp.t, sdotsp.t, shown in \ref{['subfig:tmac']}) instructions.
  • Figure 3: Encoding of instructions and input/output registers of xTern instructions
  • Figure 4: Throughput comparison between ternary convolution kernels and 2-bit kernels from the PULP-NN on the 8-core PULP cluster.
  • Figure 5: Latency breakdown comparison between 2-bit and ternary $3\times 3$ convolution kernels. Latency is normalized to the 2-bit kernel's latency and is decomposed into im2col, hot loop (HL), requantization/thresholding (RQ/THR) and Other components and shown for two test cases.
  • ...and 1 more figures