Table of Contents
Fetching ...

SynapticCore-X: A Modular Neural Processing Architecture for Low-Cost FPGA Acceleration

Arya Parameshwara

TL;DR

This work tackles the barriers to academic NPU research by delivering an open-source, Apple M-inspired neural processing unit implemented in SystemVerilog for low-cost FPGAs. It combines a PicoRV32-based RV32IMC control core with a configurable neural engine tile and a DMA-enabled scratchpad, all accessible via a reproducible, automated Vivado workflow. Hardware validation on the PYNQ-Z2 demonstrates register-level correctness, deterministic control, and cycle-accurate performance for core kernels, supported by complete artifact capture and documentation. By providing RTL, build scripts, validation tools, and tutorials, the paper aims to democratize neural microarchitectural research and enable college-level replication on commodity hardware.

Abstract

This paper presents SynapticCore-X, a modular and resource-efficient neural processing architecture optimized for deployment on low-cost FPGA platforms. The design integrates a lightweight RV32IMC RISC-V control core with a configurable neural compute tile that supports fused matrix, activation, and data-movement operations. Unlike existing FPGA accelerators that rely on heavyweight IP blocks, SynapticCore-X provides a fully open-source SystemVerilog microarchitecture with tunable parallelism, scratchpad memory depth, and DMA burst behavior, enabling rapid exploration of hardware-software co-design trade-offs. We document an automated, reproducible Vivado build pipeline that achieves timing closure at 100 MHz on the Zynq-7020 while consuming only 6.1% LUTs, 32.5% DSPs, and 21.4% BRAMs. Hardware validation on PYNQ-Z2 confirms correct register-level execution, deterministic control-path behavior, and cycle-accurate performance for matrix and convolution kernels. SynapticCore-X demonstrates that energy-efficient NPU-like acceleration can be prototyped on commodity educational FPGAs, lowering the entry barrier for academic and open-hardware research in neural microarchitectures.

SynapticCore-X: A Modular Neural Processing Architecture for Low-Cost FPGA Acceleration

TL;DR

This work tackles the barriers to academic NPU research by delivering an open-source, Apple M-inspired neural processing unit implemented in SystemVerilog for low-cost FPGAs. It combines a PicoRV32-based RV32IMC control core with a configurable neural engine tile and a DMA-enabled scratchpad, all accessible via a reproducible, automated Vivado workflow. Hardware validation on the PYNQ-Z2 demonstrates register-level correctness, deterministic control, and cycle-accurate performance for core kernels, supported by complete artifact capture and documentation. By providing RTL, build scripts, validation tools, and tutorials, the paper aims to democratize neural microarchitectural research and enable college-level replication on commodity hardware.

Abstract

This paper presents SynapticCore-X, a modular and resource-efficient neural processing architecture optimized for deployment on low-cost FPGA platforms. The design integrates a lightweight RV32IMC RISC-V control core with a configurable neural compute tile that supports fused matrix, activation, and data-movement operations. Unlike existing FPGA accelerators that rely on heavyweight IP blocks, SynapticCore-X provides a fully open-source SystemVerilog microarchitecture with tunable parallelism, scratchpad memory depth, and DMA burst behavior, enabling rapid exploration of hardware-software co-design trade-offs. We document an automated, reproducible Vivado build pipeline that achieves timing closure at 100 MHz on the Zynq-7020 while consuming only 6.1% LUTs, 32.5% DSPs, and 21.4% BRAMs. Hardware validation on PYNQ-Z2 confirms correct register-level execution, deterministic control-path behavior, and cycle-accurate performance for matrix and convolution kernels. SynapticCore-X demonstrates that energy-efficient NPU-like acceleration can be prototyped on commodity educational FPGAs, lowering the entry barrier for academic and open-hardware research in neural microarchitectures.

Paper Structure

This paper contains 47 sections, 2 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Apple M-inspired NPU architecture on PYNQ-Z2. The PicoRV32 control core orchestrates neural engine operations through PCPI co-processor interface and memory-mapped registers at base address 0x10000000.
  • Figure 2: Neural engine datapath with 16-unit MAC array, scratchpad memories, and DMA controller. The datapath achieves 1.6 GOPS at 100 MHz with 16-bit fixed-point arithmetic.
  • Figure 3: Automated Vivado build flow showing the full RTL-to-bitstream process including synthesis, implementation, AXI integration, and PYNQ-Z2 deployment.
  • Figure 4: FPGA resource utilization breakdown showing 15.5% LUT, 74.5% DSP, and 37.1% BRAM usage with both bar chart and percentage visualization.
  • Figure 5: Vivado power report screenshot confirming estimated total on-chip power of 0.158 W (31 % dynamic, 69 % static) at 26.8 $^{\circ}$C junction temperature.
  • ...and 5 more figures