Table of Contents
Fetching ...

Swapping-Centric Neural Recording Systems

Muhammed Ugur, Raghavendra Pradyumna Pothukuchi, Abhishek Bhattacharjee

TL;DR

This work proposes co-designing accelerators and storage, with swapping as a primary design goal, using theoretical and practical models of compute and storage respectively to overcome limitations within traditional neural recording systems.

Abstract

Neural interfaces read the activity of biological neurons to help advance the neurosciences and offer treatment options for severe neurological diseases. The total number of neurons that are now being recorded using multi-electrode interfaces is doubling roughly every 4-6 years \cite{Stevenson2011}. However, processing this exponentially-growing data in real-time under strict power-constraints puts an exorbitant amount of pressure on both compute and storage within traditional neural recording systems. Existing systems deploy various accelerators for better performance-per-watt while also integrating NVMs for data querying and better treatment decisions. These accelerators have direct access to a limited amount of fast SRAM-based memory that is unable to manage the growing data rates. Swapping to the NVM becomes inevitable; however, naive approaches are unable to complete during the refractory period of a neuron -- i.e., a few milliseconds -- which disrupts timely disease treatment. We propose co-designing accelerators and storage, with swapping as a primary design goal, using theoretical and practical models of compute and storage respectively to overcome these limitations.

Swapping-Centric Neural Recording Systems

TL;DR

This work proposes co-designing accelerators and storage, with swapping as a primary design goal, using theoretical and practical models of compute and storage respectively to overcome limitations within traditional neural recording systems.

Abstract

Neural interfaces read the activity of biological neurons to help advance the neurosciences and offer treatment options for severe neurological diseases. The total number of neurons that are now being recorded using multi-electrode interfaces is doubling roughly every 4-6 years \cite{Stevenson2011}. However, processing this exponentially-growing data in real-time under strict power-constraints puts an exorbitant amount of pressure on both compute and storage within traditional neural recording systems. Existing systems deploy various accelerators for better performance-per-watt while also integrating NVMs for data querying and better treatment decisions. These accelerators have direct access to a limited amount of fast SRAM-based memory that is unable to manage the growing data rates. Swapping to the NVM becomes inevitable; however, naive approaches are unable to complete during the refractory period of a neuron -- i.e., a few milliseconds -- which disrupts timely disease treatment. We propose co-designing accelerators and storage, with swapping as a primary design goal, using theoretical and practical models of compute and storage respectively to overcome these limitations.
Paper Structure (3 sections, 3 figures)

This paper contains 3 sections, 3 figures.

Figures (3)

  • Figure 1: Diagram for the most recent HALO processor Sriram2023; composed of many unique accelerators, stitched together on a low-power reconfigurable fabric with access to a NVM (left). Partial tape-out of HALO at 12 nm (right).
  • Figure 2: Power usage of different accelerators when storing their working sets in SRAM as channel counts increase. Some accelerators, like the Butterworth Bandpass Filter (BBF) and Discrete Wavelet Transform (DWT), have configurations which vary their working sets. FFT, Cross Correlation (XCOR), and Dynamic Time Warping (DTW) are fixed at a set number of samples per channel.
  • Figure 3: Shows the support of channel/sampling-rate configurations (blue region) for a naive BBF swapping approach. Configurations that cannot fit into the SRAM of BBF and the storage controller require swapping. NVM bandwidth restricts higher channel counts whereas NVM write latency is too slow for smaller, yet uncacheable, working sets.