HyDra: SOT-CAM Based Vector Symbolic Macro for Hyperdimensional Computing
Md Mizanur Rahaman Nayan, Che-Kai Liu, Zishen Wan, Arijit Raychowdhury, Azad J Naeemi
TL;DR
This work tackles the energy and latency barriers of edge-scale hyperdimensional computing by introducing HyDra, a SOT-CAM–based accelerator that executes core HDC operations (binding, permutation, and similarity search) in memory. It combines reconfigurable 5T2MTJ SOT-CAM cells with holographic information representation to embed permutation and an HDC-specific adder, delivering large energy and latency improvements over CMOS baselines while preserving accuracy within a few percent. HyDra demonstrates up to 2.27x lower inference energy than state-of-the-art accelerators and orders of magnitude gains over CPU/eGPU implementations, with edge-ready throughput and a scalable HV-dimension design. The approach includes a voltage-scaling scheme to mitigate IR drop and a reconfigurability pathway to tailor energy and latency to diverse HDC applications, enabling practical deployment for clustering, classification, and beyond.
Abstract
Hyperdimensional computing (HDC) is a brain-inspired paradigm valued for its noise robustness, parallelism, energy efficiency, and low computational overhead. Hardware accelerators are being explored to further enhance their performance, but current solutions are often limited by application specificity and the latency of encoding and similarity search. This paper presents a generalized, reconfigurable on-chip training and inference architecture for HDC, utilizing spin-orbit-torque magnetic random access memory (SOT-MRAM) based content-addressable memory (SOT-CAM). The proposed SOT-CAM array integrates storage and computation, enabling in-memory execution of key HDC operations: binding (bitwise multiplication), permutation (bit shfiting), and efficient similarity search. Furthermore, a novel bit drop method-based permutation backed by holographic information representation of HDC is proposed which replaces conventional permutation execution in hardware resulting in a 6x latency improvement, and an HDC-specific adder reduces energy and area by 1.51X and 1.43x, respectively. To mitigate the parasitic effect of interconnects in the similarity search, a four-stage voltage scaling scheme has been proposed to ensure an accurate representation of the Hamming distance. Benchmarked at 7nm, the architecture achieves energy reductions of 21.5x, 552.74x, 1.45x, and 282.57x for addition, permutation, multiplication, and search operations, respectively, compared to CMOS-based HDC. Against state-of-the-art HDC accelerators, it achieves a 2.27x lower energy consumption and outperforms CPU and eGPU implementations by 2702x and 23161x, respectively, with less than 3% drop in accuracy.
