Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

Manuel Escudero; Mohamadreza Zolfagharinejad; Sjoerd van den Belt; Nikolaos Alachiotis; Wilfred G. van der Wiel

Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

Manuel Escudero, Mohamadreza Zolfagharinejad, Sjoerd van den Belt, Nikolaos Alachiotis, Wilfred G. van der Wiel

TL;DR

A physical analog KAN architecture in which edge functions are realized in materia using reconfigurable nonlinear-processing units (RNPUs), multi-terminal nanoscale silicon devices whose input-output characteristics are tuned via control voltages.

Abstract

Kolmogorov-Arnold Networks (KANs) shift neural computation from linear layers to learnable nonlinear edge functions, but implementing these nonlinearities efficiently in hardware remains an open challenge. Here we introduce a physical analog KAN architecture in which edge functions are realized in materia using reconfigurable nonlinear-processing units (RNPUs): multi-terminal nanoscale silicon devices whose input-output characteristics are tuned via control voltages. By combining multiple RNPUs into an edge processor and assembling these blocks into a reconfigurable analog KAN (aKAN) architecture with integrated mixed-signal interfacing, we establish a realistic system-level hardware implementation that enables compact KAN-style regression and classification with programmable nonlinear transformations. Using experimentally calibrated RNPU models and hardware measurements, we demonstrate accurate function approximation across increasing task complexity while requiring fewer or comparable trainable parameters than multilayer perceptrons (MLPs). System-level estimates indicate an energy per inference of $\sim$250 pJ and an end-to-end inference latency of $\sim$600 ns for a representative workload, corresponding to a $\sim$10$^{2}$-10$^{3}\times$ reduction in energy accompanied by a $\sim$10$\times$ reduction in area compared to a digital fixed-point MLP at similar approximation error. These results establish RNPUs as scalable, hardware-native nonlinear computing primitives and identify analog KAN architectures as a realistic silicon-based pathway toward energy-, latency-, and footprint-efficient analog neural-network hardware, particularly for edge inference.

Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

TL;DR

Abstract

250 pJ and an end-to-end inference latency of

600 ns for a representative workload, corresponding to a

-10

reduction in energy accompanied by a

reduction in area compared to a digital fixed-point MLP at similar approximation error. These results establish RNPUs as scalable, hardware-native nonlinear computing primitives and identify analog KAN architectures as a realistic silicon-based pathway toward energy-, latency-, and footprint-efficient analog neural-network hardware, particularly for edge inference.

Paper Structure (12 sections, 2 equations, 5 figures, 2 tables)

This paper contains 12 sections, 2 equations, 5 figures, 2 tables.

Function approximation with analog KANs
Classification with analog KANs
Pruning analog KANs
System-level hardware comparison with digital neural networks
Discussion
ACKNOWLEDGMENTS
SUPPLEMENTARY MATERIALS

Figures (5)

Figure 1: Hardware implementations of neural networks and analog Kolmogorov-Arnold networks. (A) Example of a multilayer perceptron (MLP) with three input neurons, a hidden layer and a single output neuron. Hidden neuron $j$ computes $a_j=\sigma\!\left(\sum_k w^{(1)}_{jk}x_k+b_j\right)$, where $w^{(1)}_{jk}$ denotes the weight from input neuron $k$ to hidden neuron $j$, $x_k$ the $k$-th input, $b_j$ an optional bias term, and $\sigma(\cdot)$ a fixed nonlinear activation function. The network output is $\hat{y}=\sum_j w^{(2)}_{1j}a_j$. (B) Minimal block diagram of a digital circuit implementing a neuron. The dot product $\sum_k w^{(1)}_{jk}x_k$ is computed using multiply-accumulate (MAC) operations, followed by $\sigma(\cdot)$ to produce the activation $a_j$. (C) Analog in-memory computing (AIMC) implementation of an MLP layer using a memristive crossbar array. Inputs ($x_j$) are mapped to input voltages $V_k$ that are fed to memristive devices with conductances $G_{jk}$, generating output currents $I_j=\sum_k G_{jk}V_k$, which are digitized by an analog-to-digital converter (ADC) and passed through a digital activation function $\sigma(\cdot)$ to produce activations $a_j$. (D) Generic hardware architecture for accelerating neural-network operations, organizing multiple digital and/or analog processing elements (PEs) with local control logic (Ctrl.), arithmetic logic units (ALUs), register files, and, for analog PEs, crossbar arrays and ADCs. The PEs interface with an on-chip memory unit and are coordinated by a controller to parallelize MAC operations. (E) Schematic of a Kolmogorov-Arnold Network (KAN) [14], with learnable nonlinear edge functions $f^{l}_{ij}(\cdot)$ replacing fixed node activation functions, and node operations primarily consisting of summing incoming edge outputs. (F) Edge processor (EP) implementing KAN edge functions using parallel reconfigurable nonlinear-processing units (RNPUs). Each RNPU is configured by selecting an input electrode (via an input router) and tuning its nonlinear transfer characteristic using control voltages ($V_{c,i}$); output gains and optional skip connections further increase edge-function flexibility. (G) Analog KAN (aKAN) architecture comprising a reconfigurable array of EPs interconnected via programmable switch matrices. Input variables $x_i$ are linearly encoded as voltages within the RNPU operating range and applied to EP terminals to generate nonlinear transformations. Bias generator and gain registers configure EP characteristics. Intermediate I/V scaling blocks rescale signals for subsequent layers. A final linear readout produces the prediction $\hat{y}$.
Figure 2: Function approximation with RNPU-based edge processors. (A) Experimental sine-wave fitting using an edge processor (EP) composed of two RNPUs in parallel. A linear combination of the individual RNPU outputs (left panels) approximates (blue curve) the target sine function (black curve). (B) Sine-wave approximation using EPs composed of 2 (blue shading) and 3 (orange shading) RNPUs in parallel. The shaded regions indicate the spread among the best five fits obtained from independent training runs with randomized input-electrode selection. (C) Bessel-function approximation ($\mathrm{J}_0(20x)$) using EPs with 5 (blue shading) and 10 (orange shading) parallel RNPUs. The shaded regions show the spread among the best five fits, as in panel B. (D--F) Scalability of aKANs and MLP baselines (with ReLU or tanh activations), showing approximation error as a function of the number of trainable parameters for three target functions: $\mathrm{J}_0(20x)$ in panel D$e^{(\sin(\pi x_1)+x_2^2)}$ in panel E and $e^{(\sin(\pi(x_1^2+x_2^2)) + \sin(\pi(x_3^2+x_4^2)))}$ in panel F. Both aKANs and MLPs are simulated across a range of network depths and widths; for aKANs, we additionally vary the number of RNPUs per EP and randomly select RNPU input electrodes.
Figure 3: Benchmarking aKANs for binary classification. (A) Experimental classification and decision boundaries on the Moons and Spirals datasets by aKANs. Top left: low perturbed (noise = 0.05) Moons, where a $[2,1]_2$ aKAN achieves 99.5% accuracy, compared to 99.1% for a $[2,9,9,1]$ MLP (not shown). Top right: highly perturbed (noise = 0.15) Moons, where a $[2,1]_3$ aKAN reaches 99.5% accuracy versus 98.7% for a $[2,10,10,1]$ MLP (not shown). Bottom left: 1-turn spiral, where a $[2,3,1]_4$ aKAN achieves 99.75% accuracy compared to 98.5% for a $[2,10,10,1]$ MLP (not shown). Bottom right: 1.5-turns spiral, where a $[2,4,1]_5$ aKAN achieves 96.8% accuracy, matching 91% for a $[2,15,15,15,1]$ MLP (not shown). Networks are denoted $[n_I,n_{H1},\ldots,n_{HL},n_O]_{d}$, with $n_I$, $n_O$ the input/output sizes, $n_{H1}\ldots n_{HL}$ the hidden layer widths, and $d$ the RNPU number per EP in aKANs. (B) Simulated aKAN accuracy on Skin Segmentation, COD-RNA, and MAGIC datasets, compared with ReLU-MLPs; architectures indicated. (C) Comparison of learnable parameter count for aKANs and ReLU-MLPs across the classification tasks.
Figure 4: Regularization and pruning of analog KANs. (A) aKAN fit to $y=e^{(\sin(\pi x_1)+x_2^2)}$ without regularization or pruning (MSE $=8\times10^{-3}$). (B) Same task with L1 regularization applied to RNPU output gains (MSE $=23.8\times10^{-3}$); the contributions of many EPs are suppressed and the dominant EPs are highlighted. (C) Pruned network derived from (B) after fine-tuning the remaining EPs (MSE $=23.3\times10^{-3}$).
Figure 5: System-level energy and silicon area comparison between aKANs and MLPs for nonlinear function evaluation. Each function evaluation corresponds to the evaluation of 1,000 samples of $e^{(\sin(\pi x_1)+x_2^2)}$. For the digital baseline, MLP multiply-accumulate units are implemented using the NanGate45 Open Cell library and the tanh activation function is realized by a look-up table. (A) Estimated energy per inference as a function of mean-squared error (MSE). The transimpedance amplifier (TIA) is the dominant contribution to the aKAN energy consumption. (B) Estimated silicon area versus MSE. Multiple network widths and depths are included for both aKAN and MLP architectures. Solid red lines serve as guides to the eye.

Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

TL;DR

Abstract

Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

Authors

TL;DR

Abstract

Table of Contents

Figures (5)