Table of Contents
Fetching ...

HyDra: SOT-CAM Based Vector Symbolic Macro for Hyperdimensional Computing

Md Mizanur Rahaman Nayan, Che-Kai Liu, Zishen Wan, Arijit Raychowdhury, Azad J Naeemi

TL;DR

This work tackles the energy and latency barriers of edge-scale hyperdimensional computing by introducing HyDra, a SOT-CAM–based accelerator that executes core HDC operations (binding, permutation, and similarity search) in memory. It combines reconfigurable 5T2MTJ SOT-CAM cells with holographic information representation to embed permutation and an HDC-specific adder, delivering large energy and latency improvements over CMOS baselines while preserving accuracy within a few percent. HyDra demonstrates up to 2.27x lower inference energy than state-of-the-art accelerators and orders of magnitude gains over CPU/eGPU implementations, with edge-ready throughput and a scalable HV-dimension design. The approach includes a voltage-scaling scheme to mitigate IR drop and a reconfigurability pathway to tailor energy and latency to diverse HDC applications, enabling practical deployment for clustering, classification, and beyond.

Abstract

Hyperdimensional computing (HDC) is a brain-inspired paradigm valued for its noise robustness, parallelism, energy efficiency, and low computational overhead. Hardware accelerators are being explored to further enhance their performance, but current solutions are often limited by application specificity and the latency of encoding and similarity search. This paper presents a generalized, reconfigurable on-chip training and inference architecture for HDC, utilizing spin-orbit-torque magnetic random access memory (SOT-MRAM) based content-addressable memory (SOT-CAM). The proposed SOT-CAM array integrates storage and computation, enabling in-memory execution of key HDC operations: binding (bitwise multiplication), permutation (bit shfiting), and efficient similarity search. Furthermore, a novel bit drop method-based permutation backed by holographic information representation of HDC is proposed which replaces conventional permutation execution in hardware resulting in a 6x latency improvement, and an HDC-specific adder reduces energy and area by 1.51X and 1.43x, respectively. To mitigate the parasitic effect of interconnects in the similarity search, a four-stage voltage scaling scheme has been proposed to ensure an accurate representation of the Hamming distance. Benchmarked at 7nm, the architecture achieves energy reductions of 21.5x, 552.74x, 1.45x, and 282.57x for addition, permutation, multiplication, and search operations, respectively, compared to CMOS-based HDC. Against state-of-the-art HDC accelerators, it achieves a 2.27x lower energy consumption and outperforms CPU and eGPU implementations by 2702x and 23161x, respectively, with less than 3% drop in accuracy.

HyDra: SOT-CAM Based Vector Symbolic Macro for Hyperdimensional Computing

TL;DR

This work tackles the energy and latency barriers of edge-scale hyperdimensional computing by introducing HyDra, a SOT-CAM–based accelerator that executes core HDC operations (binding, permutation, and similarity search) in memory. It combines reconfigurable 5T2MTJ SOT-CAM cells with holographic information representation to embed permutation and an HDC-specific adder, delivering large energy and latency improvements over CMOS baselines while preserving accuracy within a few percent. HyDra demonstrates up to 2.27x lower inference energy than state-of-the-art accelerators and orders of magnitude gains over CPU/eGPU implementations, with edge-ready throughput and a scalable HV-dimension design. The approach includes a voltage-scaling scheme to mitigate IR drop and a reconfigurability pathway to tailor energy and latency to diverse HDC applications, enabling practical deployment for clustering, classification, and beyond.

Abstract

Hyperdimensional computing (HDC) is a brain-inspired paradigm valued for its noise robustness, parallelism, energy efficiency, and low computational overhead. Hardware accelerators are being explored to further enhance their performance, but current solutions are often limited by application specificity and the latency of encoding and similarity search. This paper presents a generalized, reconfigurable on-chip training and inference architecture for HDC, utilizing spin-orbit-torque magnetic random access memory (SOT-MRAM) based content-addressable memory (SOT-CAM). The proposed SOT-CAM array integrates storage and computation, enabling in-memory execution of key HDC operations: binding (bitwise multiplication), permutation (bit shfiting), and efficient similarity search. Furthermore, a novel bit drop method-based permutation backed by holographic information representation of HDC is proposed which replaces conventional permutation execution in hardware resulting in a 6x latency improvement, and an HDC-specific adder reduces energy and area by 1.51X and 1.43x, respectively. To mitigate the parasitic effect of interconnects in the similarity search, a four-stage voltage scaling scheme has been proposed to ensure an accurate representation of the Hamming distance. Benchmarked at 7nm, the architecture achieves energy reductions of 21.5x, 552.74x, 1.45x, and 282.57x for addition, permutation, multiplication, and search operations, respectively, compared to CMOS-based HDC. Against state-of-the-art HDC accelerators, it achieves a 2.27x lower energy consumption and outperforms CPU and eGPU implementations by 2702x and 23161x, respectively, with less than 3% drop in accuracy.

Paper Structure

This paper contains 30 sections, 14 figures, 1 table.

Figures (14)

  • Figure 1: Simplified dataflow of HDC model training, retraining and inference. During training encoded HV are added to corresponding class HV. In retraining, encoded HV are subtracted from the mispredicted class HV and added to correct class HV. In inference, most similar class is given as prediction.
  • Figure 2: a) 3T2MTJ SOT-CAM cell. b) Equivalent circuit
  • Figure 3: Proposed HyDra architecture with SOT-CAM.
  • Figure 4: a) Simple SOT-CAM cell design. b) The proposed $5T2MTJ$ SOT-CAM cell design for the HDC array. c) Proposed array level design (one column). PMOS connected to the $E_{ML}$ propagates the XOR output to the write driver, $WRX$. And NMOS connects the cell to the corresponding ML during search. d) Output waveform during performing an XOR-Write operation. First phase, XOR operation is performed by enabling search line (i.e., $V(SL)$ is high for first 1ns). Target WL(e.g. $WL2$) is turned on in second phase to write the XOR output. Voltage difference between $WR$ and $WR'$ is $~78mV$ that result in sufficient SOT current ($156uA$) to write with certainty.
  • Figure 5: a) Multiplication mapping on SOT-CAM array. One operand HV is loaded in SL where another one is in the array, and elementwise output is passed through $WRX$ node. b) Similarity search mapping. Query HV is applied in $SL$ and candidate HVs are inside the CAM arrays. Similarity search between query and all candidate HVs in the array is fully parallel where the Hamming distances are reflected in the corresponding ML current.
  • ...and 9 more figures