Table of Contents
Fetching ...

VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support

Wenhui Ou, Zhuoyu Wu, Yipu Zhang, Zheng Wang, C. Patrick Yue

TL;DR

VIKIN is presented, a reconfigurable accelerator that efficiently supports both KAN and MLP inference using unified hardware and introduces a pipeline execution mode and two-stage sparsity support for efficient KAN processing, while enabling parallel-mode acceleration to improve MLP throughput under the same sparsity framework.

Abstract

Recently, multi-layer perceptrons (MLPs) widely used in modern AI applications suffer from limited real-time performance due to intensive memory access overhead. Kolmogorov--Arnold Networks (KANs) have attracted increasing attention as an alternative architecture with similar structures to MLPs but improved parameter efficiency. However, the lack of dedicated hardware support limits the practical performance benefits of KANs. Moreover, since many edge workloads still rely heavily on MLPs, accelerators designed exclusively for KANs become inefficient and impractical. In this work, we present VIKIN, a reconfigurable accelerator that efficiently supports both KAN and MLP inference using unified hardware. VIKIN introduces a pipeline execution mode and two-stage sparsity support for efficient KAN processing, while enabling parallel-mode acceleration to improve MLP throughput under the same sparsity framework. Experiments on real-world datasets demonstrate that replacing MLPs with KANs on VIKIN achieves $1.28\times$ acceleration with $19.58\%$ reduced accuracy loss. For a higher-accuracy KAN model requiring $3.29\times$ more operations, VIKIN incurs only $1.24\times$ latency overhead compared with the baseline KAN model. In addition, VIKIN achieves $1.25\times$ speedup and $4.87\times$ higher energy efficiency than a representative edge GPU when executing KAN workloads.

VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support

TL;DR

VIKIN is presented, a reconfigurable accelerator that efficiently supports both KAN and MLP inference using unified hardware and introduces a pipeline execution mode and two-stage sparsity support for efficient KAN processing, while enabling parallel-mode acceleration to improve MLP throughput under the same sparsity framework.

Abstract

Recently, multi-layer perceptrons (MLPs) widely used in modern AI applications suffer from limited real-time performance due to intensive memory access overhead. Kolmogorov--Arnold Networks (KANs) have attracted increasing attention as an alternative architecture with similar structures to MLPs but improved parameter efficiency. However, the lack of dedicated hardware support limits the practical performance benefits of KANs. Moreover, since many edge workloads still rely heavily on MLPs, accelerators designed exclusively for KANs become inefficient and impractical. In this work, we present VIKIN, a reconfigurable accelerator that efficiently supports both KAN and MLP inference using unified hardware. VIKIN introduces a pipeline execution mode and two-stage sparsity support for efficient KAN processing, while enabling parallel-mode acceleration to improve MLP throughput under the same sparsity framework. Experiments on real-world datasets demonstrate that replacing MLPs with KANs on VIKIN achieves acceleration with reduced accuracy loss. For a higher-accuracy KAN model requiring more operations, VIKIN incurs only latency overhead compared with the baseline KAN model. In addition, VIKIN achieves speedup and higher energy efficiency than a representative edge GPU when executing KAN workloads.
Paper Structure (14 sections, 5 equations, 8 figures, 2 tables)

This paper contains 14 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: (a) Overview of KAN and (b) its implementation challenges. In (a), we show an example of a [2,3] KAN layer, where the output is constructed by linearly combining multiple B-spline basis functions with learnable weights. The use of B-splines enables accuracy scaling by tuning spline parameters instead of changing the model structure. In (b), the comparison between KAN and MLP is summarized based on yu2024kan, highlighting the need for unified hardware that can efficiently support both models.
  • Figure 2: Overall hardware architecture of VIKIN.
  • Figure 3: The illustration of the reconfigurable dataflow: (a) pipeline mode for KAN, (b) parallel mode for MLP.
  • Figure 4: The hardware architecture of the reconfigurable B-spline unit (SPU) supports two modes: iterative mode for B-spline bases $B_i(x)$ in KAN, and accumulation mode, which functions as a PE for MLP.
  • Figure 5: The pipeline mode with sparsity support: (a) pipeline stage 1, (b) pipeline stage 2. This mode hides SPU latency through pipeline processing and reduces PE array operations via sparsity-aware computation in the TSE. The figure also illustrates the dynamic weight buffer access scheme across different operation modes to ensure sufficient bandwidth.
  • ...and 3 more figures